Channels ▼

Mark Nelson

Dr. Dobb's Bloggers

Ripoff Artists

November 14, 2010

Nobody likes getting ripped off, and I'm no exception. I search the web from time to time to see who's copying my stuff, and it's always a little disheartening.

This week I ran a check to see who was copying my 20-year old LZW Compression article. Mind you, I'm not talking about isolated quotes taken without attribution; for the most part I'm looking for people who have posted a wholesale copy of the article - a complete rip-off. Looking through the top 25 hits yields some interesting statistics:

  • About 30% of the people who copy my work are University faculty. The assign the article as reading for a class, and instead of simply posting a link, they scrape the article off the web and post a private copy.
  • Another 40% are people who are blatantly plagiarizing - they've incorporated my work into a paper or thesis. Unfortunately for them Google now crawls PDF and PostScript files, which makes detection pretty easy
  • The remainder are blogging programmers who, for some reason, delight in taking my article and posting it on their site, reformatted and unattributed, but often with my name and contact information still intact

Taking Action

Finding these rip-off artists is easy, but getting the stolen material removed from the web is another matter. In the cases where I can clearly identify a person who owns the site, I usually start with a friendly email. Maybe 25% of the time this works, but the typical response is dead silence.

When the informal methods fail, the next step is the formal takedown notice. In the United States, web publishers enjoy protection from claims of copyright infringement under the Online Copyright Infringement Liability Limitation Act if they register a copyright agent who handles complaints, and if they respond to those complaints in a timely fashion.

This means that a site like Blogger.com, owned by Google, provides a formal mechanism for handling notices. When I can't find a link to an abuse agent, I use the WHOIS database to find the hosting service, and send an email to their address. This generally works pretty well. For example, Scribd responds to my requests within a matter of hours, and generally assumes that my complaints are legitimate unless the poster of the material puts up a decent defense.

Things aren't always so simple though. Just as an example, CiteSeer, a very popular database of academic publishing, has a cached copy of a stolen article that their crawler found. In their FAQ, under the question "How can I remove a copy of my article from your database?", they give this unhelpful tidbit:

Papers within CiteSeerX corpus are crawled from the web. The only reason a papers of yours is in the CiteSeerX database is because it was/is available from the web.

No kidding. And this helps me remove your illegal copy how?

The Tough Cases

With enough perserverance, I'm usually able to remove a large percentage of the illegal copies. But some problems remain intractable. Overseas servers in countries where English is not widely spoken are particularly difficult. I could certainly sue Baidu.com in Federal Court, but I have a feeling that wouldn't get me very far.

Even when I don't succeed, there is some entertainment value in the excuses. Today I got an email from a gentleman in India who incorporated my work in a paper published in a peer-reviewed article. He told me that he would work on taking it out, but right now he is busy taking care of his mother, who is in poor health. He hopes I will be patient.

Patient I will remain. Not like I have a choice.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video