Did Robbie the Robot really learn to read? (book review) (237)

By Dan Currell on June 20, 2021

Posted in Artificial Intelligence, Book Reviews, Diffusion Theory, Legal Innovation

Examining the gap between what machines do and what lawyers do.

A shiver of lawyers reading books is, perhaps, like a school of fish swimming: the fish don’t know the water is wet, and likewise, the lawyers, who may deeply consider what they are reading, will rarely stop to consider what reading is. But because reading is so important to the law, and one of the key capabilities of artificial intelligence (AI) is its growing ability to work with text, it’s worth a moment to pause and consider: what are we doing when we read?

This matters for two reasons. First, whether and to what extent machines can read is an important question if we are to get more out of legal AI right now. AI tools can greatly increase the productivity of lawyers, with attendant benefits to economy and society. But innovations don’t diffuse themselves. People have to adopt them, and for that to happen— especially when group decisions are involved, see, e.g., Post 008 (discussing severe negative implications of group decision making in innovation adoption)— the basic nature and user benefits of the innovation need to be clear.

Second, an ever-deeper understanding of what machines can’t do is critical to designing machines that can overcome those barriers so they can do more. In law, this is important because so little legal work actually gets done under current conditions. There is a vast opportunity to increase the accuracy and availability of legal certainty if machines can do more of the work in the corporate context. In the social context, the access to justice gap will not be closed by human lawyers. According to the Legal Services Corporation (LSC), the justice gap is not a gap but a canyon. See The Justice Gap (2017) (86% of civil legal problems reported by low-income Americans receive inadequate or no legal help).

Editor’s note: This is Part II of a summer series on AI and law written by Dan Currell. Post 232 is Part I. wdh

The LSC report focuses on low-income individuals, but the gap is universal: most legal questions go unresolved because the cost of clarity isn’t justified except in the most risky or valuable contexts. Cf. Post 236 (case study build around AI and low-risk commercial contracts).

We take legal uncertainties for granted in the same way we used to take travel uncertainties for granted. Road trips involved an inherent and unnoticed uncertainty about road closures, new construction, and traffic delays. Sometimes we would get lost, of course. We drove along not knowing what was just over the hill, where exactly the next gas station was, or that there was a historical marker of particular interest to us just a mile off the highway to the east. We didn’t know what other people thought about the two restaurants in the next town, or whether one’s bathrooms tended to be cleaner than the other’s.

That uncertainty is all gone now; Google Maps delivers near-total visibility into matters we didn’t know we didn’t know. Wiping out uncertainty creates an enormous amount of value. Even just eliminating the whole business of being lost in a car, stopping to ask for directions, and finding the destination on a combination of bad directions and dead reckoning has delivered huge value.

What if legal uncertainties became rare the way getting lost in a car has become rare?

We are testing the limits of what machines can do, and the potential value is enormous because early experiences with AI show that they can deliver things we never knew we needed. The next frontier: can cars drive themselves on open roads complicated by pedestrians, potholes, cyclists, bad drivers, and construction zones? It seems so.

What about when an ostrich merges into traffic? Things in the real world can be pretty weird.

Less noticed, we are also considering what machines can’t do.

Can a machine be made to have the common sense of an 18 month-old child? DARPA has funded this project. See David Gunning, “Machine Common Sense,” Proposers Day, Oct 18, 2018. Some of the world’s most capable minds are working on it; nobody is close to success. Since most 18 month-old humans are about a year away from being as smart as a typical dog, not only is human-level AGI a long way off, but dog-level AGI also seems well beyond our grasp.

Still, a computer that lacks the common sense of my cockapoo can do incredible things. Text storage and retrieval are among the first things computers were designed for, so we might assume they can read. But we can only say that a computer can read if we know what reading is. And it’s unclear that we do.

The Robot Reading Room

With that uncertainty in mind, we can turn to Robbie the Robot Learns to Read, a children’s board book written by Noah Waisberg, a co-founder of the legal AI firm Kira Systems and one of the authors of AI for Lawyers (reviewed in Part, Post 232). As the title suggests, this is the story of a robot’s quest to read. I won’t reproduce the book’s fine illustrations here (for scene-setting: Robbie lives in space), but I will briefly lay out its plot:

Robbie the Robot yearned to read,
so he went to the teacher, Ms. Snead.
Ms. Snead said she knew just the trick.
“Rules are what make language tick!”

Ms. Snead is a friendly, bespectacled turtle with the facial expression of someone who thinks reading is mainly about rules. Robbie learns the rules. And Robbie is ambitious! A close look at his bookshelf reveals, inter alia, at least one Danielle Steel novel, a Karel Čapek play in the original Czech, Homer’s Iliad, and Machiavelli’s The Prince.

But alas, the reading rules don’t work when Robbie opens a book:

New situations here, conflicting rules there.
Robbie could only stare and stare.

Poor Robbie. Reading based on rules didn’t work for him. Dispirited, he turns to a wise owl named Alex.

Alex had a different suggestion:
“Just keep reading for examples.”
“The words will make sense
once you have enough samples.”

Apparently, this idea suits Robbie. He dives into the task, scanning thousands of books at great speed.

He built vast language models
including words, patterns, order, position.

He worked day and night,
being literate was his mission.

Robbie learned, after studying heaps,
that you can know a word from the company it keeps.

Now Robbie can read and read.
He tears through books at tremendous speed.

THE END

It’s a cute book, and it conveys the challenges of reading as well as the great power of machines. Kids are intuitive, so hopefully, they will grasp that robots aren’t that much like humans. And the smart ones will get a glimmer of the message that reading has something to do with rules, and something to do with–well, whatever it was that Robbie did to “learn to read.”

So, did Robbie the Robot really learn to read? The denouement of this little cardboard gem is a single line: “Robbie learned, after studying heaps, that you can know a word from the company it keeps.” Is that what reading is?

How people make sense of the world

In the first of this four-part series on AI in law, “Legal’s AI rocket ship will be manned (232),” I laid out two key points about how machine and human intelligence differ. They each bear on the question of what it means to read.

First, humans are facile with abstract frameworks. From the youngest age, we make sense of the world by putting what we observe into categories, and we string those categories together into frameworks. This allows a child to see something new (cat), associate it with a familiar thing (dog), and make inferences (not always correct, don’t wrestle the cat) that allow the child to move forward with both cat and dog after just one or two observations.

This classification behavior is so essential to us that unclassified things can be a source of great stress. Those who suffer from undiagnosed diseases often report a sense of existential relief when a diagnosis is provided—even when the diagnosis doesn’t suggest a cure. Just naming the disease is part of the cure. Like doctors, lawyers are professional classifiers: we will not rest until an uncertain thing has been placed into a category and that category into a framework.

The second point of Part I (232) is that computers currently cannot classify new things and put them into an abstract framework without the help of a human. Even for something as simple as identifying objects, image-recognition AI relies on “data factories” in which humans tag the images, and machines extrapolate from that tagging. One such firm in China claims to employ 300,000 data taggers. See “China’s success at AI has relied on good data,” Economist, Jan. 4, 2020 (noting role of “cheap labor” in making AI work).

Even more abstract concepts, including nearly all legal doctrine, require a world model or framework that can accept specific factual examples into its flexible boundaries in a way that is entirely natural to humans and, for now, entirely alien to machines.

And if computers can’t work with abstract concepts, they certainly can’t analogize from one abstract concept to another. Analogizing in this way is essential to the determination of which framework is “correct”—for example, whether something is a tort or a crime (or both). It may be that a Supreme Court oral argument epitomizes what even the best AI currently cannot do in the slightest, even as appellate advocates increasingly rely on AI-enabled tools (like Casetext’s CARA) to enhance the quality and efficiency of their work.

To take a simpler example, facile framework thinking is what allows humans to encounter a large feathered thing for the first time (if you haven’t watched the ostrich video yet, you really must) and make pretty good judgments about what that thing must be and, importantly, what risks it probably does and doesn’t entail.

This situation is presumably closer to why humans’ innate framework thinking evolved or was created. We don’t need to have actually driven alongside an ostrich before, and we certainly don’t need to have analyzed thousands of validated examples of it, in order to put it into our mental model and imagine its potential universe of next possible moves.

Importantly, most of the work here consists of knowing what is not going to happen, because even if we’ve never seen a live ostrich before, we instantly—in the tiniest fraction of a second—know that the ostrich will not:

sprout a machine gun out of its lower parts and mow us down
start talking to us
bark or meow
draw its sword
accelerate to 100 km/h (wait – will it? But surely not 150km/h)
disappear
blast off
start to fly (wait, can ostriches fly?)
stop on a dime
suffer an engine fire

We instantly know these things because we have a world model, and machines don’t have those yet. The result is that a machine’s world is populated almost entirely by unknown unknowns: what is not accounted for is utterly foreign. This is the lack of common sense: because AI cannot conjure inferences out of nothing, eliminating millions of possibilities instantly based on a tiny fragment of information, computers are easily confused. And too, AI platforms can make mistakes that a sane human would never make.

For example, an otherwise highly accurate AI image recognition platform concluded with 99.9% certainty that a picture of Georgetown’s Healy Hall was a triceratops dinosaur once very minor modifications had been made to the image. See Andrew J. John, Hacking AI: A Primer for Policymakers on Machine Learning Cybersecurity at 7-8, CSET (Dec 2020). We can misperceive things, to be sure, and our perception is notoriously unreliable. But we effortlessly avoid such colossal mistakes as thinking a building is a dinosaur, and we do it with near-total reliability. That’s not because our eyesight is better and we can see that it’s a building. It’s not, and sometimes we can’t. It’s because our world model does not include living dinosaurs along the banks of the Potomac.

A computer’s lack of a world model, or framework-blindness, is a key reason why it has no common sense. We can build a humanoid robot with a high degree of manual dexterity and balance, see, e.g., Devin Coldewey, “This robot learns its two-handed moves from human dexterity,” TechCrunch, May 29, 2019, but we cannot yet build one that can pick up a child’s room. That would involve knowing when something was “out of place,” which is a very abstract concept.

As it stands now, computers can discern things like this through machine learning and thousands or millions of observations, often tagged into categories by humans. But even then they are unreliable (cf., triceratops-on-Potomac), and they can never make sense of truly new situations on the first go.

This brings us back to what it means to read.

What is reading?

Whether Waisberg intended to or not, he wrote a very short book that provides a serviceable synopsis of the phonics vs. whole language debate, a long-running fight among educators over the best way to teach kids to read.

To summarize, with phonics we teach kids to memorize a series of rules—specifically that letters correspond to sounds (phonemes), and those are the building blocks of words. With enough memorization and practice, the child becomes a good reader.

In the 1980s, phonics was challenged by an alternative approach called “whole language.” Whole language is based on the idea that good readers are not actually that rules-based. Whole language advocates observe that readers are really responding to a series of contextual cues in addition to the letters on the page. Our perception of all those cues gives us an understanding of what is being said, so whole language pedagogy teaches kids to read in part by teaching letters and sounds, but also by exposing them to pictures and syntax as other ways to grasp the meaning of a text. If a student can get the meaning of a text by using cues other than letters and words, it counts as success.

(For more detail on phonics versus whole language, I highly recommend “At a Loss for Words: How a flawed idea is teaching millions of kids to be poor readers,” AMI Reports, Aug 22, 2019)

There is mounting evidence that kids learn to read best under the rules-based phonics approach, and while whole language is still reasonably common in U.S. schools, even some of its more committed advocates have lately been in retreat. See Sarah Schwartz, “Is This the End of the ‘Three Cueing,'” Dec 16, 2020.

Robbie wanted to learn to read, and Ms. Snead is clearly a phonics teacher. But Robbie the Robot is – well, he’s a robot. Rules don’t work for Robbie, even though they work for humans. This seems odd, since computers are masterful at mathematics—perfect, even—and math is just a series of rules. So why would rules work so well for math and so poorly for reading?

There are likely many ways to answer that question, but we can start by considering that reading rules aren’t like math rules. Reading “rules” are guidelines that have to be finessed at every turn – especially in English. Robbie couldn’t apply abstract concepts to fill in gaps and inconsistencies in the rules.

Consider, too, the different outputs we are looking for in math versus reading. Mathematical analysis is a closed system: numbers in, numbers out. By contrast, the output of reading is meaning. This is something the whole language advocates got right—and it’s probably why the whole language method has hung around in our schools for several decades even though it’s an inferior way to teach kids to read.

Phonics says that getting a kid to read is a matter of connecting letters to sounds to words— and that’s it. Whole language says there’s a missing step: the reader needs to “get it”; she needs to derive meaning. It would not count as a success if a child correctly sounded out “horse” but never connected it in her mind to the big animal with hooves.

Phonics works anyway because most people have a very easy time getting meaning out of a word once the word is known. The hard part is getting from letters to words, so phonics just teaches human kids how to do that.

That said, the important point for our purposes is that a child has only learned to read when she can look at letters on a page and derive their meaning. And for humans, meaning happens when we can put an idea into a model that we can manipulate and connect to the other imagined elements of our world.

What is meaning?

Nets are for catching fish; after one gets the fish, one forgets the net. Traps are for catching rabbits; after one gets the rabbit, one forgets the trap. Words are for getting meaning; after one gets the meaning, one forgets the words.

— Zhuangzi, Chuang Tsu, Ch. 26

Lurking behind the mechanics of learning to read is the question of meaning. What does it mean to know a word, to arrive at its meaning? It’s a big question, but the entry for “meaning” from the Oxford Dictionary of Philosophy provides a sense of what we’re up against:

meaning: Whatever it is that makes what would otherwise be mere sounds and inscriptions into instruments of communication and understanding. The philosophical problem is to demystify this power, and to relate it to what we know of ourselves and the world.

We know meaning when we experience it. Whatever it is, we are good at it, which is why we’re good at reading. The reading comprehension section of the LSAT is about answering questions that go beyond the text; they show that we got its meaning.

And the relationship between meaning and words goes in both directions. Like Robbie, we too can figure out what words mean from their context, but our context relies on our world-model. It does not require millions of observations. As whole language advocates correctly point out, we regularly put an unfamiliar word into its sentence context while checking it against our framework, giving the word a provisional meaning, or even treating it as a placeholder until its role becomes clearer later on. We can also figure that the word’s meaning isn’t important once we get the broader context of what’s being said.

You likely did several of these things at the start of this piece. Had you ever heard of a shiver of lawyers? You soon inferred that it was like a school of fish—some kind of grouping. That was good enough, really. If you were curious, or annoyed, perhaps you clicked on the link to find that a shiver is the name for a group of sharks. At that point you knew it was a risky attempt at humor, considering the audience, rooted in a cultural trope about lawyers. Or perhaps you just ignored it entirely since you got the point anyway. You got the fish; you didn’t need the net anymore.

Consider how many abstractions, frameworks and world models were necessary to read that first sentence. The shiver of lawyers was like the ostrich merging into traffic: an exceedingly low-frequency sighting taken out of context that a computer might deal with in any number of ways. To us it’s not a big deal, but a computer might, so to speak, decide that it is a triceratops dinosaur.

In truth, Robbie the Robot didn’t really learn to read. He “read” Karel Čapek’s R.U.R. and Isaac Asimov’s I, Robot, but he did not add the ideas in them—their meaning—to his world model because he doesn’t have one.

But Robbie did something, and that something is incredibly valuable. What was it? Robbie learned to “know a word by the company it keeps,” which is to say he associated each word with thousands of other words based on how they appeared in other texts. He knew the other words the same way, so when we consider how Robbie derives meaning, it’s basically turtles all the way down.

So what is Robbie good for?

I have just insulted Robbie by calling into question his signal achievement (and since he is such a prodigious speed-reader there’s a good chance he’ll have read this within seconds of it going online).

So let me talk about what Robbie can do: he can analyze terms based on their association and usage, and he can do so with a logic that gets better all the time. He can do this instantly, for all practical purposes, and he can do it with an effectively infinite amount of material. Robbie will not—and might not soon—put those words into abstract frameworks in a way one sees in a Supreme Court argument, but he can turn out increasingly refined word and concept associations and produce a result that is distilled and readable by people. Humans can do some of these things, but never with his speed, scale, or consistency.

So what is Robbie good for? In truth, we should ask instead what humans are good for, since based on the description above the answer would appear to be, “not much.” Humans are slow, expensive, and unreliable.

A notable recent contribution to the common understanding of our lack of reliability is the book Noise by Kahneman, Sibony and Sunstein, which documents how inconsistent even expert human judgment is. This human trait of inconsistency has long been well understood by students of process engineering so that a commonly accepted goal of good process is to minimize human touches. Every human touch introduces variability while machine touches mostly don’t. Machine touches are also comparatively inexpensive, and acutely so when the machine is a computer.

I would argue that Robbie is good for every task in which a human is not absolutely required, which is most current legal work. For all the analysis above of world models, categories, and frameworks, the main cost of a significant legal matter is in the arrangement of facts and law in a fashion orderly enough to be analyzed. That analysis requires a lawyer, but it’s a relatively small part of the whole. The AI and automation opportunities in law are therefore enormous, and if things go well, the whole market will bend towards only paying humans to do the relatively few things humans must do.

Concretely, this means that lawyers will need to be able to create patterns out of facts, but they will not need to know how to source and assemble facts. More controversially, lawyers will need to understand how to consume legal research so they can be effective critics of it, but most will not have to be able to do legal research. (Do you need your doctor to be able to do medical research? She probably doesn’t do much of it, because she has access to those who do, and she can pass you along to Mayo Clinic if that’s what you really need.)

As mentioned at the outset, one reason to care about all of this is so we can get the most out of current AI. That matters for obvious short-term reasons of enhancing quality and reducing cost, both of which are easy to do in light of the high cost and even higher variability of the legal work we do now. It matters even more because of the big-picture opportunity to identify and remove the vast universe of unnoticed legal uncertainties that we take for granted.

For example, companies never have a catalog of all their contracts, let alone a complete grasp of contract provisions and what they mean for the business. Companies never enter a new line of business, new state, or new country with a complete sense of the legal and regulatory risks involved. They make choices about which risks to take—some bundle of known and unknown unknowns that will be dealt with by talking to old hands, chatting with competitors and peers, and hoping for the best.

This is ridiculous: law is written down and publicly available. We just don’t have the tools to digest it all yet. And the breadth of the legal system’s ignorance is not limited to the public. In a short two-year turn in the federal government, my primary areas of work all involved statutes that were passed in the 1970s or 1980s and then essentially forgotten by everyone, including the relevant agency, for years or decades thereafter. At some point an interested lawyer or regulator came across the provision, like Hilkiah finding Deuteronomy in the temple attic after it had apparently been misplaced for a decade or five. In all three cases a considerable mess ensued.

The downstream effect of using machines to fix these kinds of problems could be to make a dent in the yawning access-to-justice gap mentioned at the top of this piece—people who don’t have a will, solo entrepreneurs who are unlicensed or unincorporated, litigants who are unrepresented or uninformed (or, usually, both). AI may not be able to solve the “unrepresented” problem for a while, but surely it can help with “uninformed.”

A second reason to care about how machines deal with text is so we can mitigate or overcome what I have called AI’s framework blindness or lack of a world model. Greater minds than mine are working on the problem, but we appear to be at least one paradigm shift away from figuring it out. I am personally hopeful that the development of quantum computing and the very different software it requires will lead us down a new and more productive path towards AGI.

But for present purposes, I offer the following observation: law is the most richly funded and well-documented human endeavor in which frameworks are ends in themselves. When a machine’s framework-blindness imposes limitations on its work in epidemiology or engineering, we can just look at outcomes. If a doctor or a computer can diagnose a patient’s condition with enough accuracy, it doesn’t really matter if we can explain how the answer was derived.

But in law, the explanation is the whole game, and explanations are all about frameworks and world models. Because of this, legal AI might be best positioned to crack the framework blindness of machines and, perhaps, finally send us down the path to true human-level artificial intelligence.