China’s Filtering Tech Evolving

The People’s Republic of China blocks roughly 8 percent of online content,
according to an ongoing analysis
by researchers with the Berkman
Center for Internet & Society Harvard Law School


While the researchers found that the Chinese government makes some effort
to block sexually-explicit sites, their findings indicate it is much more
interested in blocking content related to news, health and entertainment.

Jonathan Zittrain, co-director of the center and Jack N. and Lillian R.
Berkman Assistant Professor of Entrepreneurial Legal Studies at Harvard Law
School, working with Benjamin Edelman, technology analyst and first-year
student at Harvard Law School, have been tracking Internet filtering in
China. Of the 204,012 sites the two have tested to date, the researchers
found more than 50,000 to be inaccessible from at least one point in China
on at least one occasion. In an effort to separate sites that are
intentionally blocked from those which were unreachable due to temporary
glitches, they reported that 18,931 (8 percent) sites were inaccessible
from at least two distinct proxy servers in China on at least two days.

While they found some effort had been made to block sexually explicit
content, Zittrain and Edelman found that such efforts were not particularly
effective, blocking only about 13.4 percent of their sample of well-known
sexually-explicit sites. For instance, while Playboy’s and Penthouse’s
sites were blocked, they found the Hustler Magazine site and adult site
whitehouse.com were consistently available. In comparison, the
esearchers — who have previously studied Internet filtering in Saudi
Arabia and American public libraries — found that Saudi Arabia blocked
86.2 percent of the same sample. Commercial filtering applications block
between 70 percent and 90 percent of those sites.

On the other hand, China more consistently blocks dissident/democracy sites
like Amnesty International, Human Rights Watch, the Hong Kong Voice of
Democracy, the Direct Democracy Center and Falun Gong and Falun Dafa sites.
It also makes a strong effort to block health-related sites, like the AIDS
Healthcare Foundation, the Internet Mental Health reference, and the Health
in China research project. On the education front, it blocks a number of
university sites, as well as sites for The Learning Channel, the Islamic
Virtual School and the Music Academy of Zheng. Of Google’s top 100 results
for news, 42 were blocked, and the researchers said they found evidence of
blocking of 923 sites listed in Yahoo’s News and Media directory categories
and subcategories. A variety of government sites by governments in Asia and
beyond were blocked, including U.S. court sites, state communications
organs, and state-sponsored travel sites. Taiwanese, Tibetan and religious
sites were consistently blocked, as were several movie sites.


Some of the selections seem a little odd at first, like the apparent
decision to block the site for Red Lobster restaurants. But Edelman
explained, “It does seem the word “Red” may be targeted — perhaps not
surprising for the many instances in which it’s used to refer to China,
especially in a negative light. As applied to any particular site, this is
primarily speculation — but the pattern has become increasingly clear to
me via a casual review of the sites.”


Edelman said that his findings lead him to believe that some of the
rationales for blocking may include:

  • Including content with keywords that lead human or automated
    classifiers to think that a page is hostile to China (whether or not it is)
    or discussing topics for which the Chinese intentionally restrict access

  • Previously providing such content
  • Sharing an IP address with one or more sites that contain or previously
    contained such content

  • Sharing a domain name server with one or more sites that contain or
    previously contained such content.

“Of these, only the first is, strictly speaking, “intentional” — while the
other categories (and plenty of further classes of sites) are arguably to
varying extents “accidental.” Of course, this all results from an
intentional choice not to take the necessary precautions to prevent such
accidental blocks,” he said.


For the most part, according to the researchers, the primary means of
blocking is at the router level and on the basis of IP address, which means
that those that implement the filtering must choose between blocking an
entire site on the basis of a small portion of the content, or tolerating
the entire site. The analysis also found that the Chinese government is
experimenting with more technology-intensive and refined content filtering,
like blocking by keywords or phrases in any particular HTML page requested
by the user. The researchers said such technology appears to be linked to
the ability to disable Internet access for a period of time for a user that
requests a page with forbidden content. Other nascent forms of filtering
appear to attempt to limit the information that can be gleaned from search
engines.


“The Chinese government and associated network authorities are clearly
continuing to experiment with different forms of blocking, indicating
that — unlike Saudi Arabia, which appears to have a single, declared
method of blocking and a much more constant (and apparently smaller) list
of non-sexually explicit blocked sites — Chinese network filtering is an
important instrument of state Internet policy, and one to which significant
technical and human resources continue to be devoted,” the researchers
said.

While China has not released a list of sites blocked or its methodologies
for blocking them, the researchers have been accumulating Web sites and
testing them for accessibility. From March 20 to May 6, 2002, they
connected by modem, through an international telephone call, to dial-up
accounts with several Chinese ISPs. They noted that after May 6, their
modems were unable to successfully negotiate a “handshake” with modems at
any Chinese ISPs, a failure they said was consistent across multiple phone
lines and locations, as well as multiple ISPs and points-of-presence (POPs)
in China. From Aug. 14, to Nov. 12, 2002, the researchers connected to open
proxy servers in China.

They said they conducted testing of only one URL per Web host based on
background knowledge, confirmed in subsequent testing, that when the
default page of a site is filtered, the entirety of the site is typically
filtered.


“On the basis of our testing, both automated and manual, we have reached an
increased understanding of the design of filtering systems used to restrict
Internet access in China,” they reported.


“During testing, we requested 204,012 distinct sites drawn from various Web
indices (such as sites listed within Yahoo! Taiwan’s directory categories)
and search results (such as Google’s top 100 results for a search on “China
freedom”). Most sites were accessible from China just as from our standard
Internet connection in the United States, but we found that certain URLs
were consistently unavailable. By attempting to retrieve these sites
repeatedly over time, from multiple locations within China, we drew
inferences on which specific sites among them were intentionally blocked by
Chinese network staff. Our subsequent analysis considers a site to be
blocked if it was found to be inaccessible by our testing system on at
least two distinct occasions from at least two distinct testing locations
in China, and if at those times it was simultaneously reachable from our
main testing location in the United States,” they reported.


As part of their research, they have made a real-time testing system
available to the public through the Web. Single instance tests are fairly
inconclusive. For example, an attempt Wednesday to test the Chinese
government’s official site, www.gov.cn, found the site to be “likely
inaccessible in China.”


Edelman noted, “The real-time testing system is currently experiencing an
unprecedentedly high load, which unfortunately has made it somewhat less
accurate that is typically the case. If you get the message “reportedly
accessible” or “reportedly inaccessible,” you can reach the corresponding
conclusion with some degree of certainty — but I’d advise against putting
too much stock in the “likely to be…” reports.”


He added, “Our report uses different (though related) testing methods, as
well as repeated tests over time. For these reasons, it’s likely to be
significantly more reliable than the real-time testing site.”

News Around the Web