RealTime IT News

China's Filtering Tech Evolving

The People's Republic of China blocks roughly 8 percent of online content, according to an ongoing analysis by researchers with the Berkman Center for Internet & Society Harvard Law School

While the researchers found that the Chinese government makes some effort to block sexually-explicit sites, their findings indicate it is much more interested in blocking content related to news, health and entertainment.

Jonathan Zittrain, co-director of the center and Jack N. and Lillian R. Berkman Assistant Professor of Entrepreneurial Legal Studies at Harvard Law School, working with Benjamin Edelman, technology analyst and first-year student at Harvard Law School, have been tracking Internet filtering in China. Of the 204,012 sites the two have tested to date, the researchers found more than 50,000 to be inaccessible from at least one point in China on at least one occasion. In an effort to separate sites that are intentionally blocked from those which were unreachable due to temporary glitches, they reported that 18,931 (8 percent) sites were inaccessible from at least two distinct proxy servers in China on at least two days.

While they found some effort had been made to block sexually explicit content, Zittrain and Edelman found that such efforts were not particularly effective, blocking only about 13.4 percent of their sample of well-known sexually-explicit sites. For instance, while Playboy's and Penthouse's sites were blocked, they found the Hustler Magazine site and adult site whitehouse.com were consistently available. In comparison, the esearchers -- who have previously studied Internet filtering in Saudi Arabia and American public libraries -- found that Saudi Arabia blocked 86.2 percent of the same sample. Commercial filtering applications block between 70 percent and 90 percent of those sites.

On the other hand, China more consistently blocks dissident/democracy sites like Amnesty International, Human Rights Watch, the Hong Kong Voice of Democracy, the Direct Democracy Center and Falun Gong and Falun Dafa sites. It also makes a strong effort to block health-related sites, like the AIDS Healthcare Foundation, the Internet Mental Health reference, and the Health in China research project. On the education front, it blocks a number of university sites, as well as sites for The Learning Channel, the Islamic Virtual School and the Music Academy of Zheng. Of Google's top 100 results for news, 42 were blocked, and the researchers said they found evidence of blocking of 923 sites listed in Yahoo's News and Media directory categories and subcategories. A variety of government sites by governments in Asia and beyond were blocked, including U.S. court sites, state communications organs, and state-sponsored travel sites. Taiwanese, Tibetan and religious sites were consistently blocked, as were several movie sites.

Some of the selections seem a little odd at first, like the apparent decision to block the site for Red Lobster restaurants. But Edelman explained, "It does seem the word "Red" may be targeted -- perhaps not surprising for the many instances in which it's used to refer to China, especially in a negative light. As applied to any particular site, this is primarily speculation -- but the pattern has become increasingly clear to me via a casual review of the sites."

Edelman said that his findings lead him to believe that some of the rationales for blocking may include:

  • Including content with keywords that lead human or automated classifiers to think that a page is hostile to China (whether or not it is) or discussing topics for which the Chinese intentionally restrict access
  • Previously providing such content
  • Sharing an IP address with one or more sites that contain or previously contained such content
  • Sharing a domain name server with one or more sites that contain or previously contained such content.

"Of these, only the first is, strictly speaking, "intentional" -- while the other categories (and plenty of further classes of sites) are arguably to varying extents "accidental." Of course, this all results from an intentional choice not to take the necessary precautions to prevent such accidental blocks," he said.

For the most part, according to the researchers, the primary means of blocking is at the router level and on the basis of IP address, which means that those that implement the filtering must choose between blocking an entire site on the basis of a small portion of the content, or tolerating the entire site. The analysis also found that the Chinese government is experimenting with more technology-intensive and refined content filtering, like blocking by keywords or phrases in any particular HTML page requested by the user. The researchers said such technology appears to be linked to the ability to disable Internet access for a period of time for a user that requests a page with forbidden content. Other nascent forms of filtering appear to attempt to limit the information that can be gleaned from search engines.

"The Chinese government and associated network authorities are clearly continuing to experiment with different forms of blocking, indicating that -- unlike Saudi Arabia, which appears to have a single, declared method of blocking and a much more constant (and apparently smaller) list of non-sexually explicit blocked sites -- Chinese network filtering is an important instrument of state Internet policy, and one to which significant technical and human resources continue to be devoted," the researchers said.

While China has not released a list of sites blocked or its methodologies for blocking them, the researchers have been accumulating Web sites and testing them for accessibility. From March 20 to May 6, 2002, they connected by modem, through an international telephone call, to dial-up accounts with several Chinese ISPs. They noted that after May 6, their modems were unable to successfully negotiate a "handshake" with modems at any Chinese ISPs, a failure they said was consistent across multiple phone lines and locations, as well as multiple ISPs and points-of-presence (POPs) in China. From Aug. 14, to Nov. 12, 2002, the researchers connected to open proxy servers in China.

They said they conducted testing of only one URL per Web host based on background knowledge, confirmed in subsequent testing, that when the default page of a site is filtered, the entirety of the site is typically filtered.

"On the basis of our testing, both automated and manual, we have reached an increased understanding of the design of filtering systems used to restrict Internet access in China," they reported.

"During testing, we requested 204,012 distinct sites drawn from various Web indices (such as sites listed within Yahoo! Taiwan's directory categories) and search results (such as Google's top 100 results for a search on "China freedom"). Most sites were accessible from China just as from our standard Internet connection in the United States, but we found that certain URLs were consistently unavailable. By attempting to retrieve these sites repeatedly over time, from multiple locations within China, we drew inferences on which specific sites among them were intentionally blocked by Chinese network staff. Our subsequent analysis considers a site to be blocked if it was found to be inaccessible by our testing system on at least two distinct occasions from at least two distinct testing locations in China, and if at those times it was simultaneously reachable from our main testing location in the United States," they reported.

As part of their research, they have made a real-time testing system available to the public through the Web. Single instance tests are fairly inconclusive. For example, an attempt Wednesday to test the Chinese government's official site, www.gov.cn, found the site to be "likely inaccessible in China."

Edelman noted, "The real-time testing system is currently experiencing an unprecedentedly high load, which unfortunately has made it somewhat less accurate that is typically the case. If you get the message "reportedly accessible" or "reportedly inaccessible," you can reach the corresponding conclusion with some degree of certainty -- but I'd advise against putting too much stock in the "likely to be..." reports."

He added, "Our report uses different (though related) testing methods, as well as repeated tests over time. For these reasons, it's likely to be significantly more reliable than the real-time testing site."