hide in plain text
Ted Smalley Bowen,
Technology Research News
Writers are known as much for how they
word things as for what they have to say. Putting this maxim to more prosaic
use, a group of researchers at Purdue have devised a way of using word
substitution and syntactical changes to watermark text documents.
Watermarking a document usually involves embedding an image or symbol
that is invisible to most viewers in order to establish authorship and/or
ownership. It is common to watermark bitmapped multimedia
files, which are literally maps of individual pixels, by changing some
of those pixels. Sound files are commonly watermarked by making frequency
Text formats, which store strings of characters, offer fewer opportunities
to embed visual watermarks. The alternatives include making bitmaps of
text files or changing spacing between letters, words and lines.
The Purdue scheme differs from other watermarking techniques in its use
of the actual words in a document to create patterns that, taken together,
function as a distinguishing mark. In order to read the watermark, the
user needs the unique encryption key used to create it. While not watermarking
in the sense of embedding discernible images in a document, the Purdue
natural language scheme is functionally similar.
An advantage of the scheme is the word patterns can withstand edits and
revisions, according to the researchers. "Even if a watermark-carrying
sentence is modified, the watermark [bits] it stores will survive some
changes, such as replacing words by their synonyms, and a watermark bit
has a 50-percent chance of surviving a drastic change to the sentence,"
said Mikhail Atallah, professor of computer science at Purdue.
Future implementations of the scheme could survive translation to other
human languages, according to Atallah.
The scheme does have some drawbacks, however. Because it changes some
of the language of the document, it is inherently incompatible with text,
like creative writing, whose meaning requires precise and unique syntax.
The changes include inserting an extra phrase in a sentence, splitting
a sentence, adding transitional words or converting sentences to that
bane of English teachers and editors -- the passive voice.
"It's designed for situations where style is not so important, such as
in government documents [and] user manuals, [although] the current prototype
is fine for precise technical writing," said Atallah.
It also requires that the document be at least several dozen sentences
long in order to provide enough text in which to sequester the watermarks.
"The watermark is hidden in a [relatively] small number of sentences.
So the technique is not suitable for watermarking very short text," Atallah
To test the scheme, the researchers wrote a program that uses computer-based
natural language processing to determine which words, phrases or arrangements
of words can be altered without changing the document's meaning, and then
makes the changes to watermark the text.
The changed passages are represented as syntactic tree diagrams, which
are in turn converted mathematically to streams of bits. In their experiments,
the researchers were able to hide a 26-bit watermark in text 50 sentences
In the next phase of the work, the researchers plan to use more complicated
text meaning representation (TMR) trees, which would permit them to associate
the watermarks with the meanings of words rather than their structures.
Because semantics is not directly determined by syntax, the scheme would
be more resistant to changes in the arrangement of words, according to
As a key-based system, the prototype is vulnerable, according to Dan Wallach,
an assistant professor of computer science at Rice University. "In general,
they're taking what I would describe as a reasonable approach to their
problem, [but] their security analysis... ignores the effect of an attacker
deciding to insert a new watermark using [the] same system. I'd imagine
that would make the system much easier to defeat," he said.
Although an attacker who doesn't have the key won't necessarily be able
to determine which sentences are original and which were modified, the
attacker could make more widespread and potentially unpleasant changes
to the text to wipe out all the modified sentences, said Wallach.
Another hurdle is modifying the scheme for different languages, he said.
Because the system hinges on being able to find relatively unimportant
sentences in the text that it can safely mutate, it depends on a fairly
complex language model. "This technique would need a separate model for
every language on which it was to work, and heaven only knows what it
would do with slang or with mixed-language writing," said Wallach.
Atallah's colleagues at Purdue were Victor Raskin, Michael Crogan, Christian
Hemplemann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. The researchers
presented their work at the International Information Hiding Workshop
in Pittsburgh, April 25-27, 2001. The research was funded by Purdue's
Center for Education and Research in Information Assurance Security (CERIAS).
TRN Categories: Cryptography and Security
Story Type: News
Related Elements: Technical paper, "Natural Language Watermarking:
Design, Analysis, and Proof-of-Concept Implementation" published in the
Proceedings of the 4th International Information Hiding Workshop, Pittsburgh,
Pennsylvania, April 25-27, 2001.
Search scheme treads
set up desktop chipmaking
DNA parts make
Watermarks hide in plain
Material bends sound waves
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog
Buy an ad link