In this post, we present a way of reviving digital archives.
In some people are found in their nature , the two senses .
Last week, a group of Dutch researchers presented what they called “The next Rembrandt” (reproduced above): a digital, 3-D printed painting looking like it might have been created by Dutch artist Rembrandt van Rijn. And in a way, it has: the scientists used digital versions of actual Rembrandt paintings to teach a computer algorithm what a Rembrandt looks like. The computer then used this knowledge to create the new painting all by itself. And while arts some critics are not happy – for a machine, it did a pretty decent job.
The Rembrandt project illustrates one of the most transformative characteristics of the digital archive: how easily we can re-create and re-mix it without doing any harm to the original. (If there is such an original for a digital archive). We can use a digital archive of Rembrandts to make a new Rembrandt.
This was not possible in an analog world. If we have, for instance, an archive of physical copies of Aristotle’s writings, we cannot highlight, extract, or re-arrange the text without harming our books. A digital collection of his works allows us to do this very easily.
Thus, in an effort to translate the ideas of the “Next Rembrandt” project to the field of Rhetoric, Amy Tuttle and I collected digital, English-language copies of major works of Aristotle from MIT’s online Classics Archive and from Project Gutenberg; then we wrote a Python script trying to teach one of the lab computers to write like Aristotle.
At this point, our script (I hesitate to call it “The Next Aristotle” quite yet), is able to come up with grammatical sentences most of the time; some of them we can – with some goodwill – attribute some deeper philosophical meaning to. Here is some of the (unedited) output:
It was in one thing , the case with it .
And it would have an animal to all animals .
It was in their motion , the one .
(Full disclosure: the Aristotle “quote” at the beginning of this post was also created by the script.) It helps that humans usually try hard to find some kind of meaning in any sequence of words; if encountered “in the wild”, we unconsciously add or re-arrange words to make sense of what we’re reading.
But the algorithm will also write things like:
It remains that in its nature is more or that they must not the whole .
For the former of a sense .
In this reason that it has two things , the other is an animal .
Sometimes, our Aristotle script will unfortunately create straight-up word rubbish.
The fact .
It remains for they would then , the other is a body which they have the body ; e.g
Our algorithm consists of essentially two parts.
First, we read all the Aristotle books into the computer to “learn” how Aristotle wrote. This gives us a data set of 1,230,000 words. We count how often words and combinations of words occur. That allows us to calculate the probability of word X following word Y: For instance, how likely is it that the word “the” is followed by “dog” in Aristotle’s works? If we teach our machine right, it will assign a higher probability to “dog” than for instance “a” or a second “the”.
In a second step, we generate new sentences out of this knowledge. For this specific project, we just start with a “.” and then successively pick the words to follow it based on probability: as a first step, we randomly pick from the 20 words most likely to follow “.”. Assuming we happen to pick “The”; we then repeat the procedure for “The” until we hit another full stop, exclamation or question mark.
This is an excessively simplified version of a machine learning approach used in Natural Language Processing; an introduction to the technique can be found here.
The goal of this project was not to create perfect imitations of Aristotle’s writing style, but rather to explore some of the possibilities of the digital archive. It shows how we can use the digital archive to create new digital objects that are based on, but not copied from, the archive. These objects in turn can in some cases be valuable additions to the existing archive or become one in their own right.
Image Credit: The Next Rembrandt Project, Press release: http://thenextrembrandt.pr.co/125449-can-technology-and-data-bring-back-to-life-one-of-the-greatest-painters-of-all-time.