Watermarks hide in plain text

By Ted Smalley Bowen, Technology Research News

Writers are known as much for how they word things as for what they have to say. Putting this maxim to more prosaic use, a group of researchers at Purdue have devised a way of using word substitution and syntactical changes to watermark text documents.

Watermarking a document usually involves embedding an image or symbol that is invisible to most viewers in order to establish authorship and/or ownership. It is common to watermark bitmapped multimedia files, which are literally maps of individual pixels, by changing some of those pixels. Sound files are commonly watermarked by making frequency changes.

Text formats, which store strings of characters, offer fewer opportunities to embed visual watermarks. The alternatives include making bitmaps of text files or changing spacing between letters, words and lines.

The Purdue scheme differs from other watermarking techniques in its use of the actual words in a document to create patterns that, taken together, function as a distinguishing mark. In order to read the watermark, the user needs the unique encryption key used to create it. While not watermarking in the sense of embedding discernible images in a document, the Purdue natural language scheme is functionally similar.

An advantage of the scheme is the word patterns can withstand edits and revisions, according to the researchers. "Even if a watermark-carrying sentence is modified, the watermark [bits] it stores will survive some changes, such as replacing words by their synonyms, and a watermark bit has a 50-percent chance of surviving a drastic change to the sentence," said Mikhail Atallah, professor of computer science at Purdue.

Future implementations of the scheme could survive translation to other human languages, according to Atallah.

The scheme does have some drawbacks, however. Because it changes some of the language of the document, it is inherently incompatible with text, like creative writing, whose meaning requires precise and unique syntax. The changes include inserting an extra phrase in a sentence, splitting a sentence, adding transitional words or converting sentences to that bane of English teachers and editors -- the passive voice.

"It's designed for situations where style is not so important, such as in government documents [and] user manuals, [although] the current prototype is fine for precise technical writing," said Atallah.

It also requires that the document be at least several dozen sentences long in order to provide enough text in which to sequester the watermarks. "The watermark is hidden in a [relatively] small number of sentences. So the technique is not suitable for watermarking very short text," Atallah said.

To test the scheme, the researchers wrote a program that uses computer-based natural language processing to determine which words, phrases or arrangements of words can be altered without changing the document's meaning, and then makes the changes to watermark the text.

The changed passages are represented as syntactic tree diagrams, which are in turn converted mathematically to streams of bits. In their experiments, the researchers were able to hide a 26-bit watermark in text 50 sentences long.

In the next phase of the work, the researchers plan to use more complicated text meaning representation (TMR) trees, which would permit them to associate the watermarks with the meanings of words rather than their structures. Because semantics is not directly determined by syntax, the scheme would be more resistant to changes in the arrangement of words, according to the researchers.

As a key-based system, the prototype is vulnerable, according to Dan Wallach, an assistant professor of computer science at Rice University. "In general, they're taking what I would describe as a reasonable approach to their problem, [but] their security analysis... ignores the effect of an attacker deciding to insert a new watermark using [the] same system. I'd imagine that would make the system much easier to defeat," he said.

Although an attacker who doesn't have the key won't necessarily be able to determine which sentences are original and which were modified, the attacker could make more widespread and potentially unpleasant changes to the text to wipe out all the modified sentences, said Wallach.

Another hurdle is modifying the scheme for different languages, he said. Because the system hinges on being able to find relatively unimportant sentences in the text that it can safely mutate, it depends on a fairly complex language model. "This technique would need a separate model for every language on which it was to work, and heaven only knows what it would do with slang or with mixed-language writing," said Wallach.

Atallah's colleagues at Purdue were Victor Raskin, Michael Crogan, Christian Hemplemann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. The researchers presented their work at the International Information Hiding Workshop in Pittsburgh, April 25-27, 2001. The research was funded by Purdue's Center for Education and Research in Information Assurance Security (CERIAS).

Timeline:   Now
Funding:  University
TRN Categories:   Cryptography and Security
Story Type:   News
Related Elements:  Technical paper, "Natural Language Watermarking: Design, Analysis, and Proof-of-Concept Implementation" published in the Proceedings of the 4th International Information Hiding Workshop, Pittsburgh, Pennsylvania, April 25-27, 2001.


June 6, 2001

Page One

Search scheme treads lightly

Bug-eye lenses set up desktop chipmaking

DNA parts make versatile nanotubes

Watermarks hide in plain text

Material bends sound waves


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.