Français	Contact Us	Help	Search	Canada Site

Home	Site Map	What's New	About Us	Registration

On-Line Services

Broadcasting

Radiocom

Telecom

Certification / Standards

Consumer Info

Gazette Notices and Petitions

Consultations

Official Publications

Reports and References

Internet Issues

	Content
	Cyberwise Strategy
	Domain Name System (DNS) and Addressing
	Regulation
	Related Sites

Related Sites

Contact Spectrum / Telecom

Printable Version

Content Filtering Technologies and Internet Service Providers

Format PDF 207, KB
Access to Documents

Michael Shepherd and Carolyn Watters Footnote 1

Web Information Filtering Lab
Faculty of Computer Science
Dalhousie University
6050 University Avenue
Halifax, Nova Scotia, Canada B3H 1W5

Executive Summary

March 22, 2000

The project reports on the mechanisms that Internet Service Providers (ISPs) have the option to provide and that users can choose to utilize in order to filter the content delivered to users over the Internet and to allow authorized access to that content. The report is purely descriptive of the filtering mechanisms available and does not provide policy or legal advice or recommendations.It was commissioned by Industry Canada to help promote the development, awareness and use of tools and technologies that enable Internet users to make choices about the content that they access on the Internet.

Classification of Mechanisms

The mechanisms are classified into two tiers, the application level mechanisms and the underlying core technologies.The core technologies are classified as follows:

Site labels
Labeling refers to schemes to assign content related labels to URL's and/or specific Web pages. The URL (Universal Resource Locator) describes the location of a specific Web page.Individual rating protocols exist, in general, separate from products or applications using these ratings. In general, these labels can be stored as part of the Web page or separately from the Web page in a database. Labels may be the result of self-rating, third-party authority rating, or community rating by interested users. Footnote 2

Lists ofappropriate or inappropriate sites ("white" and "black" lists)
The most frequently used filtering technology is the use of lists of acceptable and/or unacceptable URL’s. "White" lists are used to define a domain of "safe" Web sites within which users can browse. These typically require people to search and select sites that are approved by the provider of the list. "Black" lists are lists compiled of URL’s from which requests will not be serviced.The lists are compiled as a service by individuals or by communities of raters.

Automated text analysis
Another way to analyze a Web site is to use software that scans the text of a site to determine the relevance or suitability of pages. Users or groups of users have profiles of interests (positive and/or negative), consisting of keywords and phrases, that are used in this determination. Almost all content based filtering uses some variation of keyword matching, where keywords from a profile of interest are compared against the keywords occurring in the content of the specific Web page.Text analysis is also used to screen search terms from search queries.

Authorization
Encryption, password protection, and credit card validation techniques are used to authenticate that a user has the authorization to access given services or data.

Activity tracing
Internet usage can be traced by using the server log files and other data logs. These files store details of all Web accesses and can be used to analyze Web-related activities.

The filtering applications that are built on these underlying technologies have been classified as follows:

Special purpose browsers for children
Browser applications can be written that are targeted to child users. Such applications can provide easier search strategies and friendlier graphics, remove advertisements, and provide filtering and search-safe domains in a way that makes it transparent to the user.

Child-friendly search engines and portals
The idea behind both special purpose child-friendly search engines and portals is to use a third party gateway to Web content. Child-friendly portals are Web access sites that try to provide a domain of safe sites for the user to explore. As long as the user comes in through the portal, they view a pre-selected domain set of the Web.
Proxy applications
Proxy software is software that is at the ISP and acts as the intermediary between the client or browser and the Internet. Application software can be added to server proxy modules that permits the execution of text analysis and URL list comparisons for each Internet browser request and response.
Activity monitors
Rather than restrict or control access to Web sites proactively, these applications monitor and log Internet activity for parental review.
Restricted access applications
Applications residing on the host site can be written that restrict access to services or data on that site to authorized users. These applications may encrypt the data so that only authorized users can decrypt and view the data.
Non-HTTP applications.
In addition to Web page access, applications can be written using these core technologies to filter content of email and to control access to ftp sites, telnet hosts, discussion and chat groups, and newsgroups.

Potential of the Core Technologies

All of these core technologies have a role to play in filtering mechanisms for the Web. None of these core technologies provides a long term solution on its own. Systems will need to combine the technologies in innovative ways to provide effective solutions. In particular, we note that:

Site labeling systems are the most flexible and perhaps hold the most promise for the future. Labels may be assigned by the content provider, third party rating services, communities of users, and/or individual users. It is the responsibility of the ISP and/or of the client to use the labels that have been assigned.

URL lists are the most effective in controlling domains of access.This method is particularly good for creating child-friendly sites. However, the use of lists does not provide the flexibility of labels and the Web is growing so quickly it is very hard to keep lists up-to-date.

Accurate automated analysis of the content of Web sites is problematic at best, due to the vagaries of natural language, the difficulties of cross-language filtering, and the difficulties of determining the content of graphics and images. Although using this technology to assist in the labeling of sitesbased on text categorization techniques does hold some promise.

Access authorization can be effective as a reverse filter, restricting who can have access to a given site.

Activity tracing can only monitor what has been done, it cannot actually filter out any material.

Although Web crawlers are not considered a core technology, they can act as rating agents that rate sites proactively based on content analysis algorithms. Although this has not proven very accurate to-date, there is potential for improvement in this area.

It must be emphasized that none of the above technologies are 100% effective and that the content of the Web is, by its very nature, volatile.In order to be (more) effective, these technologies have to be used in combinations and in layers, both at the ISP and client.As these applications and core technologies are used in combination and as most can be applied at either the ISP or the client or both, there is no clear recommendation as to the best method or where it is best applied.

The areas that hold the greatest potential and where future efforts should be focused include:

The development of architectures that combine various mechanisms to work in collaboration or as layers within the architecture.For example, labels managed by a Label Bureau, with lists managed by the ISP, and with content analysis at the client.

Further research into Web crawlers and agents for the proactive categorization of Web sites.

The development of architectures supporting collaborative filtering for communities of users.

While virtually all of the techniques reviewed in this report can be implemented by an ISP, it is important that the use of filtering is transparent to the user. The user should be informed when filtering is in place or has occurred. To be effective, it is important that:

The user knows when content has been filtered and why;
The user knows the criteria for filtering, i.e., what is on the list or in the filter; and

Given the dynamic nature of the Web, a concerted and continuing effort into the development, evaluation, and maintenance of filtering and access control mechanisms will be required at all levels, including; government, community, ISP, and individual.

Footnotes

1.With Margo Boyd, Research Assistant

2. Balkin, J., B.Noveck and K.Roosevelt. 1999. Filtering the Internet: A best practices model. In Protecting our children on the Internet: towards a new culture of responsibility.Bertelsmann Foundation Pub.


		Created: 2003-06-06 Updated: 2004-12-01	Top of Page	Important Notices