The Risks of MetadataIntroductionOver the past several years, there have been a number of incidents in which “document metadata” has caused professional and political embarrassment. The metadata reveals, sometimes to the contrary of public assertions, how, when and by whom a document was created and into whose hands it travelled. In this fact sheet, we look at the risks associated with metadata and we offer some suggestions on how you can minimize those risks. What is metadata?Metadata is usually defined as “data about data” or “information about information”. Think of it as a hidden level of extra information that is automatically created and embedded in a computer file. An example that you may be familiar with is that of the label on a can of soup. The label contains, in a standardized, structured format, information about the contents of the can (e.g., the type of soup, who made it, the ingredients and nutritional value and so on). In a similar fashion, the metadata associated with a document (in the form of keywords, for instance) can provide information about the contents of the document. Whenever a document is created, edited or saved, metadata is added to a document. This information accompanies the document whenever it is sent in electronic form (e.g., as an attachment to an e-mail) to other groups or individuals, internally or externally to an organization. This metadata may contain potentially sensitive information that could be inadvertently disclosed to unauthorized individuals or groups. For the purposes of this fact sheet, we will be referring to metadata associated with electronic documents. Examples of metadata include:
As you can see, a substantial amount of “extra” information is associated with electronic documents. Because the metadata is not readily visible, and because the susceptible applications may not provide any mechanism to warn users that comments are embedded or that attached documents contain metadata, you may unknowingly send confidential information to people outside your organization. The same risks apply if you post certain kinds of documents to your website. What are the risks associated with metadata?The software applications that seem to be most affected by the metadata issue are office productivity applications such as Microsoft Word, Excel and PowerPoint, Corel WordPerfect, Sun’s StarOffice and OpenOffice (a multi-platform open standards office suite). The use of collaboration features built into these applications (e.g., comments and Track Changes), along with features intended to enhance productivity (e.g., the Fast Saves option in Word), results in metadata being added to a document. Some of these applications (e.g., StarOffice) save the metadata in a separate file. Metadata is a classic case of a double-edged sword – it can be both helpful and harmful. For example, document metadata supports intelligent information categorization and searching (e.g., through the use of keywords), version control and workflow. The ability to view other people’s comments and suggested changes to a document, using the Track Changes feature, is central to collaborating with co-workers on a project. However, changes that are not accepted still remain with the document, even though they are not readily visible (they can be displayed by turning on the “Show markup view”) and could be inadvertently exposed to unauthorized individuals whenever the document is shared as an e-mail attachment or via floppy disk or CD-ROM or posted to a website. Financial ImpactThere may be fines or other financial penalties levied against an organization as a result of the exposure of sensitive information. Microsoft Word document statistics (e.g., when the document was created, modified, accessed and/or printed) are included in the metadata, along with revision number and total time spent editing the document as well as the names of the people who worked with the document and the filenames under which the document existed. If these statistics do not match information provided to a client for billing purposes, for example, this could result in embarrassment or financial penalties for the organization. Competitive DisadvantageIf an existing document is used as a template for a new document, information specific to the previous use (e.g., client information, pricing, comments, etc.) could be stored as hidden information in the new document. If a competitor is able to obtain a copy of the document, they may be able to retrieve the hidden information and provide preferential pricing to lure away a customer. Another example would be where several individuals have collaborated on the preparation of a document outlining the features of a new product, using Track Changes, comments or the versioning feature in Microsoft Word. In this case, sensitive information may be included in changes that are not accepted (e.g., it may be decided to exclude certain product features from the literature because they are not quite ready) or in earlier versions of the document. This information could be accessed by a competitor who could then use that knowledge to its competitive advantage (i.e., it could release a product containing the excluded features and undercut your market position). Regulatory ImpactAn error in sending out an e-mail resulted in a company’s full-year profit results being potentially exposed before they were finalized and submitted to the appropriate regulatory body. Apart from the embarrassment and potential impact on the company’s share price, the incident has resulted in an investigation being launched by the regulatory body.1 What can you do to protect yourself?There are a number of potential steps you can take to mitigate the risk associated with information leakage via document metadata. These include, but are by no means limited to the following:
Where to go for more informationAs mentioned above, there are a number of companies that have developed software tools to help remove metadata from documents before they are shared or made public. These companies often provide advice and guidance on removing metadata from documents. Microsoft has also published a number of How-to guides for removing metadata from Office documents. These guides can be found in the Microsoft Knowledge Base as follows:
The Corel Knowledge Base (http://support.corel.com/scripts/rightnow.cfg/php.exe/enduser/std_alp.php) contains two entries (Answer ID 753605 and 759035) that address the topic of removing metadata from WordPerfect Office documents. 1 Iain Ferguson, ZDNet Australia, Hit send...and regret it, at http://www.zdnet.com.au/news/communications/soa/ July 2006 |
Date published: 2006-07-19 |
Important Notices |