Technology

Nat Friedman Embraces AI to Translate the Herculaneum Papyri

0
Please log in or register to do it.
Nat Friedman Embraces AI to Translate the Herculaneum Papyri


This firewood-looking thing is a scroll that might contain a lost literary masterpiece. Advanced scanning and AI technology aims to virtually unroll it, flatten it, and detect minuscule remnants of ink. Courtesy: Vesuvius Challenge

Almost 2,000 years ago, a volcano preserved Herculaneum’s vast library of scrolls but left them unreadable. A volunteer army of nerds has been racing to decipher them.

A few years ago, during one of California’s steadily worsening wildfire seasons, Nat Friedman’s family home burned down. A few months after that, Friedman was in Covid-19 lockdown in the Bay Area, both freaked out and bored. Like many a middle-aged dad, he turned for healing and guidance to ancient Rome. While some of us were watching Tiger King and playing with our kids’ Legos, he read books about the empire and helped his daughter make paper models of Roman villas. Instead of sourdough, he learned to bake Panis Quadratus, a Roman loaf pictured in some of the frescoes found in Pompeii. During sleepless pandemic nights, he spent hours trawling the internet for more Rome stuff. That’s how he arrived at the Herculaneum papyri, a fork in the road that led him toward further obsession. He recalls exclaiming: “How the hell has no one ever told me about this?”

Latest Issue
Featured in Bloomberg Businessweek, Feb. 12, 2024. Subscribe now. Photos courtesy Vesuvius Challenge

The Herculaneum papyri are a collection of scrolls whose status among classicists approaches the mythical. The scrolls were buried inside an Italian countryside villa by the same volcanic eruption in 79 A.D. that froze Pompeii in time. To date, only about 800 have been recovered from the small portion of the villa that’s been excavated. But it’s thought that the villa, which historians believe belonged to Julius Caesar’s prosperous father-in-law, had a huge library that could contain thousands or even tens of thousands more. Such a haul would represent the largest collection of ancient texts ever discovered, and the conventional wisdom among scholars is that it would multiply our supply of ancient Greek and Roman poetry, plays and philosophy by manyfold. High on their wish lists are works by the likes of Aeschylus, Sappho and Sophocles, but some say it’s easy to imagine fresh revelations about the earliest years of Christianity.

“Some of these texts could completely rewrite the history of key periods of the ancient world,” says Robert Fowler, a classicist and the chair of the Herculaneum Society, a charity that tries to raise awareness of the scrolls and the villa site. “This is the society from which the modern Western world is descended.”

Nat Friedman (right) and Brent Seales

Friedman (right) and Brent Seales, who’s been working to read the scrolls for 20 years. Photographer: Helynn Ospina for Bloomberg Businessweek

The reason we don’t know exactly what’s in the Herculaneum papyri is, y’know, volcano. The scrolls were preserved by the voluminous amount of superhot mud and debris that surrounded them, but the knock-on effects of Mount Vesuvius charred them beyond recognition. The ones that have been excavated look like leftover logs in a doused campfire. People have spent hundreds of years trying to unroll them—sometimes carefully, sometimes not. And the scrolls are brittle. Even the most meticulous attempts at unrolling have tended to end badly, with them crumbling into ashy pieces.

In recent years, efforts have been made to create high-resolution, 3D scans of the scrolls’ interiors, the idea being to unspool them virtually. This work, though, has often been more tantalizing than revelatory. Scholars have been able to glimpse only snippets of the scrolls’ innards and hints of ink on the papyrus. Some experts have sworn they could see letters in the scans, but consensus proved elusive, and scanning the entire cache is logistically difficult and prohibitively expensive for all but the deepest-pocketed patrons. Anything on the order of words or paragraphs has long remained a mystery.

But Friedman wasn’t your average Rome-loving dad. He was the chief executive officer of GitHub Inc., the massive software development platform that Microsoft Corp. acquired in 2018. Within GitHub, Friedman had been developing one of the first coding assistants powered by artificial intelligence, and he’d seen the rising power of AI firsthand. He had a hunch that AI algorithms might be able to find patterns in the scroll images that humans had missed.

After studying the problem for some time and ingratiating himself with the classics community, Friedman, who’s left GitHub to become an AI-focused investor, decided to start a contest. Last year he launched the Vesuvius Challenge, offering $1 million in prizes to people who could develop AI software capable of reading four passages from a single scroll. “Maybe there was obvious stuff no one had tried,” he recalls thinking. “My life has validated this notion again and again.”

As the months ticked by, it became clear that Friedman’s hunch was a good one. Contestants from around the world, many of them twentysomethings with computer science backgrounds, developed new techniques for taking the 3D scans and flattening them into more readable sheets. Some appeared to find letters, then words. They swapped messages about their work and progress on a Discord chat, as the often much older classicists sometimes looked on in hopeful awe and sometimes slagged off the amateur historians.

On Feb. 5, Friedman and his academic partner Brent Seales, a computer science professor and scroll expert, plan to reveal that a group of contestants has delivered transcriptions of many more than four passages from one of the scrolls. While it’s early to draw any sweeping conclusions from this bit of work, Friedman says he’s confident that the same techniques will deliver far more of the scrolls’ contents. “My goal,” he says, “is to unlock all of them.”

Illustration of the Villa dei Papyri in Herculaneum, Italy - artist rendering

An artist’s rendering of the villa where the scrolls were found. Source: Rocío Espín

Before Mount Vesuvius erupted, the town of Herculaneum sat at the edge of the Gulf of Naples, the sort of getaway wealthy Romans used to relax and think. Unlike Pompeii, which took a direct hit from the Vesuvian lava flow, Herculaneum was buried gradually by waves of ash, pumice and gases. Although the process was anything but gentle, most inhabitants had time to escape, and much of the town was left intact under the hardening igneous rock. Farmers first rediscovered the town in the 18th century, when some well-diggers found marble statues in the ground. In 1750 one of them collided with the marble floor of the villa thought to belong to Caesar’s father-in-law, Senator Lucius Calpurnius Piso Caesoninus, known to historians today as Piso.

During this time, the first excavators who dug tunnels into the villa to map it were mostly after more obviously valuable artifacts, like the statues, paintings and recognizable household objects. Initially, people who ran across the scrolls, some of which were scattered across the colorful floor mosaics, thought they were just logs and threw them on a fire. Eventually, though, somebody noticed the logs were often found in what appeared to be libraries or reading rooms, and realized they were burnt papyrus. Anyone who tried to open one, however, found it crumbling in their hands.

Terrible things happened to the scrolls in the many decades that followed. The scientif-ish attempts to loosen the pages included pouring mercury on them (don’t do that) and wafting a combination of gases over them (ditto). Some of the scrolls have been sliced in half, scooped out and generally abused in ways that still make historians weep. The person who came the closest in this period was Antonio Piaggio, a priest. In the late 1700s he built a wooden rack that pulled silken threads attached to the edge of the scrolls and could be adjusted with a simple mechanism to unfurl the document ever so gently, at a rate of 1 inch per day. Improbably, it sort of worked; the contraption opened some scrolls, though it tended to damage them or outright tear them into pieces. In later centuries, teams organized by other European powers, including one assembled by Napoleon, pieced together torn bits of mostly illegible text here and there.

You can imagine why trying to just pull one of these open really hard didn’t go well. Courtesy: Vesuvius Challenge

Today the villa remains mostly buried, unexcavated and off-limits even to the experts. Most of what’s been found there and proven legible has been attributed to Philodemus, an Epicurean philosopher and poet, leading historians to hope there’s a much bigger main library buried elsewhere on-site. A wealthy, educated man like Piso would have had the classics of the day along with more modern works of history, law and philosophy, the thinking goes. “I do believe there’s a much bigger library there,” says Richard Janko, a University of Michigan classical studies professor who’s spent painstaking hours assembling scroll fragments by hand, like a jigsaw puzzle. “I see no reason to think it should not still be there and preserved in the same way.” Even an ordinary citizen from that time could have collections of tens of thousands of scrolls, Janko says. Piso is known to have corresponded often with the Roman statesman Cicero, and the apostle Paul had passed through the region a couple of decades before Vesuvius erupted. There could be writings tied to his visit that comment on Jesus and Christianity. “We have about 800 scrolls from the villa today,” Janko says. “There could be thousands or tens of thousands more.”

In the modern era, the great pioneer of the scrolls is Brent Seales, a computer science professor at the University of Kentucky. For the past 20 years he’s used advanced medical imaging technology designed for CT scans and ultrasounds to analyze unreadable old texts. For most of that time he’s made the Herculaneum papyri his primary quest. “I had to,” he says. “No one else was working on it, and no one really thought it was even possible.”

Progress was slow. Seales built software that could theoretically take the scans of a coiled scroll and unroll it virtually, but it wasn’t prepared to handle a real Herculaneum scroll when he put it to the test in 2009. “The complexity of what we saw broke all of my software,” he says. “The layers inside the scroll were not uniform. They were all tangled and mashed together, and my software could not follow them reliably.”

By 2016 he and his students had managed to read the Ein Gedi scroll, a charred ancient Hebrew text, by programming their specialized software to detect changes in density between the burnt manuscript and the burnt ink layered onto it. The software made the letters light up against a darker background. Seales’ team had high hopes to apply this technique to the Herculaneum papyri, but those were written with a different, carbon-based ink that their imaging gear couldn’t illuminate in the same way.

Over the past few years, Seales has begun experimenting with AI. He and his team have scanned the scrolls with more powerful imaging machines, examined portions of the papyrus where ink was visible and trained algorithms on what those patterns looked like. The hope was that the AI would start picking up on details that the human eye missed and could apply what it learned to more obfuscated scroll chunks. This approach proved fruitful, though it remained a battle of inches. Seales’ technology uncovered bits and pieces of the scrolls, but they were mostly unreadable. He needed another breakthrough.

Running tiny scroll fragments through a particle accelerator yielded valuable training data for contestants’ AI models. Courtesy: Vesuvius Challenge

Friedman set up Google alerts for Seales and the papyri in 2020, while still early in his Rome obsession. After a year passed with no news, he started watching YouTube videos of Seales discussing the underlying challenges. Among other things, he needed money. By 2022, Friedman was convinced he could help. He invited Seales out to California for an event where Silicon Valley types get together and share big ideas. Seales gave a short presentation on the scrolls to the group, but no one bit. “I felt very, very guilty about this and embarrassed because he’d come out to California, and California had failed him,” Friedman says.

On a whim, Friedman proposed the idea of a contest to Seales. He said he’d put up some of his own money to fund it, and his investing partner Daniel Gross offered to match it.

Seales says he was mindful of the trade-offs. The Herculaneum papyri had turned into his life’s work, and he wanted to be the one to decode them. More than a few of his students had also poured time and energy into the project and planned to publish papers about their efforts. Now, suddenly, a couple of rich guys from Silicon Valley were barging into their territory and suggesting that internet randos could deliver the breakthroughs that had eluded the experts.

More than glory, though, Seales really just hoped the scrolls would be read, and he agreed to hear Friedman out and help design the AI contest. They kicked off the Vesuvius Challenge last year on the Ides of March. Friedman announced the contest on the platform we fondly remember as Twitter, and many of his tech friends agreed to pledge their money toward the effort while a cohort of budding papyrologists began to dig into the task at hand. After a couple of days, Friedman had amassed enough money to offer $1 million in prizes, along with some extra money to throw at some of the more time-intensive basics.

Friedman hired people online to gather the existing scroll imagery, catalog it and create software tools that made it easier to chop the scrolls into segments and to flatten the images out into something that was readable on a computer screen. After finding a handful of people who were particularly good at this, he made them full members of his scroll contest team, paying them $40 an hour. His hobby was turning into a lifestyle.

The initial splash of attention helped open new doors. Seales had lobbied Italian and British collectors for years to scan his first scrolls. Suddenly the Italians were now offering up two new scrolls for scanning to provide more AI training data. With Friedman’s backing, a team set to work building precision-fitting, 3D-printed cases to protect the new scrolls on their private jet flight from Italy to a particle accelerator in England. There they were scanned for three days straight at a cost of about $70,000.

Seeing the imaging process in action drives home both the magic and difficulty inherent in this quest. One of the scroll remnants placed in the scanner, for example, wasn’t much bigger than a fat finger. It was peppered by high-energy X-rays, much like a human going through a CT scan, except the resulting images were delivered in extremely high resolution. (For the real nerds: about 8 micrometers.) These images were virtually carved into a mass of tiny slices too numerous for a person to count. Along each slice, the scanner picked up infinitesimal changes in density and thickness. Software was then used to unroll and flatten out the slices, and the resulting images looked recognizably like sheets of papyrus, the writing on them hidden.

The files generated by this process are so large and difficult to deal with on a regular computer that Friedman couldn’t throw a whole scroll at most would-be contest winners. To be eligible for the $700,000 grand prize, contestants would have until the end of 2023 to read just four passages of at least 140 characters of contiguous text. Along the way, smaller prizes ranging from $1,000 to $100,000 would be awarded for various milestones, such as the first to read letters in a scroll or to build software tools capable of smoothing the image processing. With a nod to his open-source roots, Friedman insisted these prizes could be won only if the contestants agreed to show the world how they did it.

An algorithm that can detect tiny amounts of ink on each little piece of a scroll fragment can then combine that data into a unified, legible simulation of how the scroll might have appeared back in 79 A.D. Courtesy: Vesuvius Challenge

Luke Farritor was hooked from the start. Farritor—a bouncy 22-year-old Nebraskan undergraduate who often exclaims, “Oh, my goodness!”—heard Friedman describe the contest on a podcast in March. “I think there’s a 50% chance that someone will encounter this opportunity, get the data and get nerd-sniped by it, and we’ll solve it this year,” Friedman said on the show. Farritor thought, “That could be me.”

The early months were a slog of splotchy images. Then Casey Handmer, an Australian mathematician, physicist and polymath, scored a point for humankind by beating the computers to the first major breakthrough. Handmer took a few stabs at writing scroll-reading code, but he soon concluded he might have better luck if he just stared at the images for a really long time. Eventually he began to notice what he and the other contestants have come to call “crackle,” a faint pattern of cracks and lines on the page that resembles what you might see in the mud of a dried-out lakebed. To Handmer’s eyes, the crackle seemed to have the shape of Greek letters and the blobs and strokes that accompany handwritten ink. He says he believes it to be dried-out ink that’s lifted up from the surface of the page.

Luke Farritor in his basement with his heavy-duty computer and his results.

Farritor in his basement with his heavy-duty computer and his results. Photographer: Shawn Brackbill for Bloomberg Businessweek

The crackle discovery led Handmer to try identifying clips of letters in one scroll image. In the spirit of the contest, he posted his findings to the Vesuvius Challenge’s Discord channel in June. At the time, Farritor was a summer intern at SpaceX. He was in the break room sipping a Diet Coke when he saw the post, and his initial disbelief didn’t last long. Over the next month he began hunting for crackle in the other image files: one letter here, another couple there. Most of the letters were invisible to the human eye, but 1% or 2% had the crackle. Armed with those few letters, he trained a model to recognize hidden ink, revealing a few more letters. Then Farritor added those letters to the model’s training data and ran it again and again and again. The model starts with something only a human can see—the crackle pattern—then learns to see ink we can’t.

A cross-section scan of a scroll on Luke Farritor’s screen.

A cross-section scan of a scroll on Farritor’s screen. Photographer: Shawn Brackbill for Bloomberg Businessweek

Unlike today’s large-language AI models, which gobble up data, Farritor’s model was able to get by with crumbs. For each 64-pixel-by-64-pixel square of the image, it was merely asking, is there ink here or not? And it helped that the output was known: Greek letters, squared along the right angles of the cross-hatched papyrus fibers.

In early August, Farritor received an opportunity to put his software to the test. He’d returned to Nebraska to finish out the summer and found himself at a house party with friends when a new, crackle-rich image popped up in the contest’s Discord channel. As the people around him danced and drank, Farritor hopped on his phone, connected remotely to his dorm computer, threw the image into his machine-learning system, then put his phone away. “An hour later, I drive all my drunk friends home, and then I’m walking out of the parking garage, and I take my phone out not expecting to see anything,” he says. “But when I open it up, there’s three Greek letters on the screen.”

Around 2 a.m., Farritor texted his mom and then Friedman and the other contestants about what he’d found, fighting back tears of joy. “That was the moment where I was like, ‘Oh, my goodness, this is actually going to work. We’re going to read the scrolls.’”

Soon enough, Farritor found 10 letters and won $40,000 for one of the contest’s progress prizes. The classicists reviewed his work and said he’d found the Greek word for “purple.”

Farritor continued to train his machine-learning model on crackle data and to post his progress on Discord and Twitter. The discoveries he and Handmer made also set off a new wave of enthusiasm among contestants, and some began to employ similar techniques. In the latter part of 2023, Farritor formed an alliance with two other contestants, Youssef Nader and Julian Schilliger, in which they agreed to combine their technology and share any prize money.

Luke Farritor’s first win came from identifying the word “ΠΟΡΦΥΡΑϹ” (Greek word for "purple") on a Herculaneum scroll.

Farritor’s first win came from identifying the word “ΠΟΡΦΥΡΑϹ” (“purple”) on the center line here. Courtesy: Vesuvius Challenge

In the end, the Vesuvius Challenge received 18 entries for its grand prize. Some submissions were ho-hum, but a handful showed that Friedman’s gamble had paid off. The scroll images that were once ambiguous blobs now had entire paragraphs of letters lighting up across them. The AI systems had brought the past to life. “It’s a situation that you practically never encounter as a classicist,” says Tobias Reinhardt, a professor of ancient philosophy and Latin literature at the University of Oxford. “You mostly look at texts that have been looked at by someone before. The idea that you are reading a text that was last unrolled on someone’s desk 1,900 years ago is unbelievable.”

The winning entry from Farritor, Nader and Schilliger shows text across 15 columns of one of the scrolls.

The winning entry from Farritor, Nader and Schilliger shows text across 15 columns of one of the scrolls. Courtesy: Vesuvius Challenge

A group of classicists reviewed all the entries and did, in fact, deem Farritor’s team the winners. They were able to stitch together more than a dozen columns of text with entire paragraphs all over their entry. Still translating, the scholars believe the text to be another work by Philodemus, one centered on the pleasures of music and food and their effects on the senses. “Peering at and beginning to transcribe the first reasonably legible scans of this brand-new ancient book was an extraordinarily emotional experience,” says Janko, one of the reviewers. While these passages aren’t particularly revelatory about ancient Rome, most classics scholars have their hopes for what might be next.

There’s a chance that the villa is tapped out—that there are no more libraries of thousands of scrolls waiting to be discovered—or that the rest have nothing mind-blowing to offer. Then again, there’s the chance they contain valuable lessons for the modern world.

That world, of course, includes Ercolano, the modern town of about 50,000 built on top of ancient Herculaneum. More than a few residents own property and buildings atop the villa site. “They would have to kick people out of Ercolano and destroy everything to uncover the ancient city,” says Federica Nicolardi, a papyrologist at the University of Naples Federico II.

Barring a mass relocation, Friedman is working to refine what he’s got. There’s plenty left to do; the first contest yielded about 5% of one scroll. A new set of contestants, he says, might be able to reach 85%. He also wants to fund the creation of more automated systems that can speed the processes of scanning and digital smoothing. He’s now one of the few living souls who’s roamed the villa tunnels, and he says he’s also contemplating buying scanners that can be placed right at the villa and used in parallel to scan tons of scrolls per day. “Even if there’s just one dialogue of Aristotle or a beautiful lost Homeric poem or a dispatch from a Roman general about this Jesus Christ guy who’s roaming around,” he says, “all you need is one of those for the whole thing to be more than worth it.”

More On Bloomberg



Source link

10 Things You Must Do Before CNY 2024 Approaches, If Not You'll Have an Unlucky Dragon Year
These luxury kicks will help you put your best foot forward this Chinese New Year