Transcript
RESEARCH METHODOLOGY
THE USE OF PDF AND OTHER FILE FORMATS IN RESEARCH
ATHOY OLATUBOSU E. Arc 01 9211 April, 2008.
TABLE OF CONTENTS 1
INTRODUCTION .................................................................................................................. 1
2
PDF FILE FORMAT............................................................................................................... 2
3
2.1
History of PDF ............................................................................................................. 3
2.2
Technical foundations ................................................................................................ 4
2.3
Fonts ........................................................................................................................... 4
2.4
How to view PDF files ................................................................................................. 4
2.5
How to print PDF files ................................................................................................. 4
2.6
How to Save PDF Files ................................................................................................ 5
2.2
Today’s use of PDF...................................................................................................... 5
XPS FILE FORMAT ............................................................................................................... 7 3.1
Microsoft XPS documents .......................................................................................... 7
3.2
Getting Started ........................................................................................................... 7
3.3
XPS document writer .................................................................................................. 7
3.4
Creating XPS documents in the 2007 Microsoft Office system.................................. 8
3.5
XPS document viewing ............................................................................................... 8
3.6
High-fidelity onscreen graphics .................................................................................. 8
3.7
Managing XPS documents .......................................................................................... 9
3.8
Improved Windows printing experience .................................................................. 10
3.9
Technology ............................................................................................................... 10
3.10
Features ................................................................................................................ 11
3.11
Similarities with PDF ............................................................................................. 12
3.12
Hardware .............................................................................................................. 12
3.13
Software ................................................................................................................ 12
3.14
Licensing ................................................................................................................ 12
3.15
Integrate XPS in Your Application ......................................................................... 13 ii
4
WICD CORE 1.0 FILE FORMAT .......................................................................................... 14
5
CONCLUSION .................................................................................................................... 15
6
REFERENCES ..................................................................................................................... 15
iii
1
INTRODUCTION
Documents especially reports and research works are now being produced through the use of word processors or computers. These documents have to be saved or stored in a particular format. These formats vary between softwares. Also, in many cases, there is need to create documents which are portable, and can be viewed and printed by people using different computers, operating systems and printers. The most used file formats are the ‘DOC’ and ‘PDF’ file formats. Decently introduced by Microsoft Word 2007 is the ‘DOCX’ format which is easily compatible with other softwares in the Microsoft Office 2007 suite. File formats are the rules that specify, for each type of data, which sequences of bits mean what. A file format specification contains all of the information needed to write and read that kind of file in any computer program. This paper highlights the use of these formats in the presentation of documents, their merits and demerits and how to effectively use them.
1
2
PDF FILE FORMAT
The Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is a fixed-layout format used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Each PDF file encapsulates a complete description of a 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the document. PDF is an open standard, and recently took a major step towards becoming ISO 32000PDF, PDF captures formatting information from a variety of desktop publishing applications making it possible to send documents and have them appear on the recipient's monitor (or printer) as they were intended to be viewed. A properly prepared PDF will maintain the original fonts, images, graphics as well as the exact layout of the file (think of it as an electronic snapshot). A PDF file can be shared, viewed, and printed by anyone using the free Adobe Reader software regardless of the operating system, original design application or fonts. Originally PDF was mostly used by graphic artists, designers and publishers for producing colour page proofs. With its evolving technology, however, today PDF is used for virtually any data that needs to be exchanged among applications and users. It is an open file format specification and PDF is available to anyone who wants to develop tools to create, view or manipulate PDF documents. PDF or Portable Document Format can be used to:
Share files with others who don't have the same software
Share files with others who use a different platform (Mac, Windows, Linux, etc.)
Share files that will look the same (layout, fonts) on multiple computer systems
Share files that can be protected from unauthorized viewing, printing, copying, or editing
Publish electronic documents, ebooks, etc.
Print files to many different types of printers, and all look essentially the same
Create files with annotations, hyperlinks, and bookmarks that can be shared via email and on the Web 2
Create interactive forms that can be shared via email and the Web
Create files that are more efficient than PostScript or native file formats typically used in commercial printing
2.1
History of PDF
When the PDF first came out in the early 1990s, its general adoption was slow. At that time, the PDF-creation tools (Acrobat) and the viewing and printing software had to be bought. Early versions of PDF had no support for external hyperlinks, reducing its usefulness on the World Wide Web; the additional size of the PDF document compared to plain text meant significantly longer download times over the slower modems common at the time, and rendering the files was slow on less powerful machines. Additionally, there were competing formats such as Envoy, Common Ground Digital Paper and even Adobe's own PostScript format (.ps); in those early years, the PDF file was mainly popular in desktop publishing workflow. Adobe soon started distribution of its Acrobat Reader (now Adobe Reader) program at no cost, and continued supporting the original PDF, which eventually became the de facto standard for printable documents on the web (a standard web document). The PDF file format has changed several times, as new versions of Adobe Acrobat were released. There have been eight versions of PDF with corresponding Acrobat releases:
(1993) - PDF 1.0 / Acrobat 1.0
(1994) - PDF 1.1 / Acrobat 2.0
(1996) - PDF 1.2 / Acrobat 3.0
(1999) - PDF 1.3 / Acrobat 4.0
(2001) - PDF 1.4 / Acrobat 5.0
(2003) - PDF 1.5 / Acrobat 6.0
(2005) - PDF 1.6 / Acrobat 7.0
(2006) - PDF 1.7 / Acrobat 8.0
3
2.2
Technical foundations
Anyone may create applications that read and write PDF files without having to pay royalties to Adobe Systems; Adobe holds patents to PDF, but licenses them for royalty-free use in developing software complying with its PDF specification. The PDF combines three technologies:
A sub-set of the PostScript page description programming language, for generating the layout and graphics.
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.
2.3
Fonts
A font object in PDF is a description of a digital typeface. It may either describe the characteristics of a typeface, or it may include an embedded font file. The latter case is called an embedded font while the former is called an un-embedded font. The font files that may be embedded are based on widely used standard digital font formats: Type 1 (and its compressed variant CFF), TrueType, and (beginning with PDF 1.6) OpenType. Additionally PDF supports the Type 3 variant in which the components of the font are described by PDF graphic operators.
2.4
How to view PDF files
Once the Adobe® Reader® software is properly installed, you just need to click on a link to a .pdf file and it will be loaded for viewing on your computer. Note that the file is sent to your computer as a temporary file that will be deleted when you exit your browser. PDF files are indicated by either the letters [PDF or pdf] or the following graphic following the filename
2.5
How to print PDF files
When printing PDF files from within your web browser, do NOT use the web browser print facility. Instead, use the print button at the left end of the special Adobe® Reader® tool bar,
4
which appears immediately above the viewing window. See illustration below for location of this print button.
2.6
How to Save PDF Files
To save the file for later use once you have loaded it for viewing, on most browsers you just select "File" then "Save As..." from the menu bar. To just save without viewing, place the cursor over the link to the PDF file, hold down the shift key and click the mouse. This should bring up the "Save As" window.
2.2
Today’s use of PDF
It is becoming increasingly easy to create PDF files as (from a user's stand-point) the process is almost as simple as printing. Essentially, anything that can be done with a sheet of paper can be done with a PDF. PDF technology is being used more frequently to produce offset printed documents (provided the designer properly embeds fonts and images). Adding to mainstream adoption, of course, is the fact that many applications allow users to save, import or export a document as a PDF (including popular publishing programs like QuarkXPress and CorelDraw), and you can also find a variety of third-party PDF conversion software tools available. With the capability to embed metadata (data about data) in a PDF file, along with the use of security options and electronic signatures PDF is also becoming a standard for data archiving. It may have taken a few years to perfect — and years of dedication by the development team at Adobe, but today more and more people are turning to PDF as the solution for something not even thought of in 1993. Keep information secure — Digitally sign or password-protect Adobe PDF documents created with Adobe Acrobat® 7.0 or Adobe LiveCycle™ software. Searchable — Leverage full-text search features to locate words, bookmarks, and data fields in documents. Accessible — Adobe PDF documents work with assistive technology to help make information accessible to people with disabilities.
5
Use PDF if ... 1. You want the user to have a high quality file with the same fonts, colours, and graphics as the original. 2. You want the user to complete a form, but not manipulate the rest of the document. 3. You want a document that can easily be read off the screen and printer in high quality. 4. You want a document that is small in size and easy to share. 5. You want a document that is can be viewed cross-platform.
Don’t Use PDF if ... 1. You want to keep it very simple. You have the hassle of converting the file and downloading the reader software. 2. You want the user to edit the document. Instead, create the document in a shared application such as Microsoft Word. 3. You don't care about formatting or printing. Instead, just create a web page.
6
3
XPS FILE FORMAT
3.1
Microsoft XPS documents
The XML Paper Specification (XPS), formerly codenamed "Metro", is a specification for a page description language and a fixed-document format developed by Microsoft. It is an XML-based (more precisely XAML-based) specification, based on a new print path and a colour-managed vector-based document format which supports device independence and resolution independence. It can use to archive content in a standardized format or publish content in an easily viewable form. You can also use this format to ensure that no one is able to edit your original work. It provides users and developers with a robust, open and trustworthy format for electronic paper. The XML Paper Specification describes electronic paper in a way that can be read by hardware, read by software, and read by people. XPS documents print better, can be shared easier, are more secure and can be archived with confidence.
3.2
Getting Started
Windows Vista makes it easy to get started with XPS documents. Windows Vista lets you generate XPS documents from any application by simply selecting the Microsoft XPS Document Writer as the printer when printing. In Windows Vista, you can double-click an XPS document to automatically open the document inside an XPS viewer. You can also download the tools to provide these features and get started with XPS documents in earlier versions of Windows.
3.3
XPS document writer
Whether you are working with documents in Microsoft Word, photos in Microsoft Paint, or a web order form being viewed in Internet Explorer, if the application has the ability to print, then the Microsoft XPS Document Writer will be available through the Print dialog of the application. Use the XPS Document Writer from any application to publish content as an XPS document. After you have created your XPS document, you can use Windows Vista to add "tags"— custom keywords—to your document to make it even easier to find. Windows Vista also has 7
all the necessary components to index XPS documents, so you can instantly find your XPS documents by filename, author, or even by text contained within the document itself.
3.4
Creating XPS documents in the 2007 Microsoft Office system
With a free download, you can also "Save as XPS" directly from the following 2007 Microsoft Office system programs: Access, Excel, InfoPath, OneNote, PowerPoint, Publisher, Visio, and Word. Rather than going through the print menu, you can just use the "File | Save As" option to create an XPS version of your document, presentation, or spreadsheet directly from the authoring application. From the 2007 release, you can save as or publish directly to XPS.
3.5
XPS document viewing
The XPS Viewer is installed by default in Windows Vista. The viewer is hosted within Internet Explorer 7. This Internet Explorer-hosted viewer and the XPS Document Writer are also available to Windows XP users when they download the .NET Framework 3.0. The IE-hosted viewer supports digital rights management and digital signatures. For users who do not wish to view XPS documents in the browser, they can download the XPS Essentials Pack which includes a standalone viewer and the XPS Document Writer. The XPS Essentials Pack also includes providers to enable the iPreview and iFilter capabilities used by Windows Desktop Search and shell handlers to enable thumbnail views and file properties for XPS documents in Windows Explorer. The standalone viewer however does not support digital signatures.
The XPS Essentials Pack is available for Windows XP,
Windows Server 2003 and Windows Vista. Installing downlevel XPS support enables operating systems prior to Windows Vista to use the XPS print processor, instead of the GDIbased WinPrint, which can produce better quality prints for printers that support XPS in hardware (directly consume the format). The print spooler format on these operating systems however remains unchanged.
3.6
High-fidelity onscreen graphics
XPS documents feature support for high-fidelity text and graphics. When the document is magnified, the text always appears smooth, clear, and accurate at any size.
8
XPS document and non-XPS document magnified.
3.7
Managing XPS documents
Digitally signing your XPS documents You also can "sign" XPS documents to ensure their integrity as they travel from point A to point B. For example, in a corporate environment that supports document signing certificates, you can digitally sign an XPS document directly within the XPS Viewer. In many countries, these digital signatures are considered to be legally valid signatures. If the document is modified or tampered with by a malicious individual, recipients viewing the changed document are notified that the signature is no longer valid. Digitally sign your XPS document to ensure the integrity of its contents. Applying rights management to your XPS documents If your company has already deployed Microsoft Windows Rights Management Services (RMS), you can use those services to add an additional layer of protection to your XPS documents. With built-in RMS support, you can set specific access permissions to any XPS document, allowing you to protect sensitive information even after the document is published and shared. You can use the XPS rights management capabilities to designate who can read your document, copy text from it, or print it. With XPS Rights Management Services, you can even set an expiration date after which access to the document is no longer enabled. Rights Management Services allow you to manage access permissions for both Microsoft Office file formats and XPS documents. This common platform for rights management across Microsoft products enables seamless integration when using XPS with other Microsoft applications and services. For example, copying attaching an XPS document to an email with restricted permissions will automatically apply those restrictions to the attached XPS. Similarly, saving an XPS document to a Microsoft Office SharePoint document library that has restricted permissions will also automatically apply those restrictions to that XPS document. This platform integration of Rights Management services with XPS and other Microsoft products is more cost effective than having to purchase separate rights management offerings to protect each document and publishing format within an organization.
9
3.8
Improved Windows printing experience
In recent years, the capabilities of printers for the personal computer have dramatically improved in resolution, colour, and quality, while costs have dropped significantly. As a result, you can print digital memories at home using inexpensive yet sophisticated colour printers that provide quality output in a matter of seconds. At work, you can use advanced graphics such as transparencies and gradients to bring your sales materials to life. In Windows Vista, XPS is the basis for the new printing infrastructure, and when paired with an XPS-enabled printer, you get a truly next-generation printing experience. Windows Vista and an XPS-enabled printer offer the following benefits:
Improved colour printing The operating system can communicate a broader range of colour information from applications to inkjet printers that use more than four ink colours (known as wide-gamut printers). The advanced colour capabilities available in XPS make Windows Vista a great platform for printing photos with more lifelike output.
High-fidelity print output The XPS print infrastructure enables high-fidelity output by reducing or eliminating image data conversions and colour space conversions that typically occur during printing. The benefit for users is that smooth shadings, fades, and glow effects used in modern documents print just as intended, without loss of image fidelity or colour fidelity.
3.9
Technology
Intended as the replacement for the Enhanced Metafile (EMF) format which is the print spooler format in the GDI print path, the XPS document format is the same as the spooler format used in the XPS print path. It serves as the page description language (PDL) for printers. For printers supporting XPS, this eliminates an intermediate conversion to a printer-specific language, increasing the reliability and fidelity of the printed output. The document format consists of structured XML markup that defines the layout of a document and the visual appearance of each page, along with rendering rules for distributing, archiving, rendering, processing and printing the documents. Notably, the markup language for XPS is a subset of XAML, allowing it to incorporate vector-graphic 10
elements in documents, using XAML to mark up the WPF primitives. The elements used are described in terms of paths and other geometrical primitives. An XPS file is in fact a ZIP archive using the Open Packaging Convention, which contains the files which make up the document. These include an XML markup file for each page, text, embedded fonts, raster images, 2D vector graphics, as well as the digital rights management information. The contents of an XPS file can be examined simply by renaming it as a ZIP file and then opening it in an application which supports ZIP files.
3.10 Features As the document format is the same as the spooler format in the XPS print path, XPS encapsulates an exact representation of the printed output. It has support for advanced printing features such as gradients, transparencies, CMYK colour spaces, printer calibration, multiple-ink systems and print schemas. XPS supports the Windows Colour System colour management technology for better colour conversion precision across devices and higher dynamic range. It also includes a software raster image processor (RIP) which is downloadable separately. The print subsystem also has support for named colours, simplifying colour definition for images transmitted to printers supporting those colours. XPS also supports HD Photo images natively for raster images. The XPS print path can automatically calibrate colour profile settings with those being used by the display subsystem. Conversely, XPS print drivers can express the configurable capabilities of the printer, by virtue of the XPS PrintCapabilities class, to enable more fine-grained control of print settings, tuned to the individual printing device. Applications which use the Windows Presentation Foundation for the display elements can directly print to the XPS print path without the need for image or colourspace conversion. The XPS format used in the spool file represents advanced graphics effects such as 3D images, glow effects, and gradients as Windows Presentation Foundation primitives, which are processed by the printer drivers without rasterization, preventing rendering artifacts and reducing computational load. When the legacy GDI Print Path is used, the XPS spool file is used for processing before it is converted to a GDI image to minimize the processing done at raster level. 11
3.11 Similarities with PDF Like Adobe's PDF format, XPS is a fixed-layout document format designed to preserve document fidelity, so that documents look the same and as they are intended on any device. PDF is based on PostScript whereas XPS is based on XML. XPS is also the spool file format for printers, like PostScript is. Owing to such similarities, XPS is viewed as a potential competitor to PDF. However, PDF includes dynamic capabilities not supported by the XPS format, and will not be replaced by XPS when such capabilities are needed. Microsoft has submitted the XPS specification to the ISO.
3.12 Hardware XPS has the support of printing companies such as Sharp, Canon, Epson, Hewlett-Packard and Xerox and software and hardware companies such as Software Imaging, Pagemark Technology Inc., Informative Graphics Corp. (IGC), NiXPS NV, Zoran and Global Graphics. Devices that are Certified for Windows Vista level of Windows Logo conformance certificate are required to have XPS drivers for printing as of June 1, 2007.
3.13 Software NiXPS - supports viewing and printing of XPS files on Mac OS X and Windows. ExpertXPS - a .NET library for developers, developed by Outside Software Inc., that can generate XPS files instantly. GhostXPS - an XPS viewer developed by Artifex Okular - a document viewer for KDE 4 that supports XPS files.
3.14 Licensing In order to encourage wide use of the format, Microsoft has released XPS under a royaltyfree patent license, allowing users to create implementations of the specification that read, write and render XPS files as long as you include a notice within the source that technologies implemented may be encumbered by patents held by Microsoft. Microsoft also requires that organizations "engaged in the business of developing (i) scanners that output XPS Documents; (ii) printers that consume XPS Documents to produce hard-copy output; or (iii) 12
print driver or raster image software products or components thereof that convert XPS Documents for the purpose of producing hard-copy output, you covenant that you and your affiliates will not sue Microsoft or any of its licensees under the XML Paper Specification or customers for infringement of any XML Paper Specification Derived Patents (as defined below) on account of any manufacture, use, sale, offer for sale, importation or other disposition or promotion of any XML Paper Specification implementations." The specification itself is released under a royalty-free copyright license, allowing its free distribution.
3.15 Integrate XPS in Your Application You can seamlessly integrate your application with enterprise-wide workflows by adding the ability to publish, import, and view XPS documents. .NET Framework 3.0 provides the APIs that enable you to add XPS-based publishing, importing, and viewing technologies to your Windows Presentation Foundation (WPF) application. Adding XPS-based technologies to your application gives your customers a print quality that was previously found only in highend graphic arts applications because the Windows Vista print sub-system has been enhanced to recognize and process XPS documents. XPS-based technologies support innovation and format consistency so you can build your application with features for the future while retaining the safety-net of backward compatibility. With this freedom you can decide how to add XPS-based technologies to your application.
13
4
WICD CORE 1.0 FILE FORMAT
Compound Document is the W3C term for a document that combines multiple formats. WICD stands for Web Integration Compound Document and is based on the idea of integrating existing markup language formats in preference to inventing new markup. The Compound Document by Reference Framework (CDRF) and Web Integration Compound Document (WICD) Core have dependencies on the following documents: This Web Integration Compound Document (WICD) Core specification describes rules for combining Extensible Hypertext Markup Language (XHTML), Cascading Style Sheets (CSS), and Scalable Child Content formats, such as Scalable Vector Graphics (SVG), in a device independent manner. WICD Core 1.0 is based upon the Compound Document by Reference Framework 1.0 (CDRF) and serves as a foundation for the creation of rich multimedia content profiles. CDRF - Compound Document by Reference Framework CDRF describes generic rules and behaviour for combining sets of standalone XML formats. The Compound Document Framework is language-independent. While it is clearly meant to serve as the basis for integrating W3C's family of XML formats within its Interaction Domain (e.g., CSS, MathML, SMIL, SVG, VoiceXML, XForms, XHTML, XSL) with each other, it can also be used to integrate non-W3C formats with W3C formats or integrate non-W3C formats with other non-W3C formats.
14
5
CONCLUSION
PDF and other file formats are very important in the presentation and sharing of research work and information about them should be known as this would aid their use.
6
REFERENCES
Adobe Systems Incorporated, PDF Reference, Sixth edition, version 1.7 (30 MB), p. 33 Orion, Egan (2007-12-05). PDF 1.7 is approved as ISO 32000 (HTML). The Inquirer. The Inquirer. Retrieved on 2008-04-05. Adobe wins backing for PDF 1.7 - vnunet.com Laurens Leurs. The history of PDF. Retrieved on 2007-09-19. Wisdom of the PDF Sage » History of PDF Openness Developer Resources Adobe Systems, PDF Reference, p. 51 Adobe Systems, PDF Reference, pp. 39-40 [http://www.adobe.com/devnet/pdf/pdf_reference.html Adobe – PDF Developer Centre: PDF reference PDF Blend Modes Addendum AIIM (2006-10-20). New Best Practices Guide Addresses Exchange of Healthcare Information. Retrieved on 2007-03-09. Jackson, Joab (2006-12-07). Adobe plunges PDF into XML. Government Computer News. Retrieved on 2008-01-12. Adobe Forums, ANNOUNCEMENT: PDF Attachment Virus "Peachy", 15 August 2001 New features and issues addressed in the Acrobat 7.0.5 Update (Acrobat and Adobe Reader for Windows and Mac OS) http://www.planetpdf.com/planetpdf/pdfs/pdf2k/03e/merz_fontaquarium.pdf
15