Preview only show first 10 pages with watermark. For full document please download

Adobe Pdf Library Developer Overview

   EMBED


Share

Transcript

Developer Overview Adobe PDF Library v7.0 Datalogics ® Datalogics ADOBE PDF LIBRARY Developer Overview This guide is part of the Adobe PDF Library v7.0.5 suite; 08/17/06. Copyright 1999-2006 Datalogics Incorporated. All Rights Reserved. Use of Datalogics software is subject to the applicable license agreement. Datalogics Interface (DLI) is a trademark of Datalogics Incorporated. Other products mentioned herein as Datalogics products are also trademarks or registered trademarks of Datalogics, Incorporated. Adobe, Adobe PDF Library, PostScript, Acrobat, Distiller, Exchange and Reader are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States or other countries. HP and HP-UX are registered trademarks of Hewlett Packard Corporation. IBM, AIX, AS/400, OS/400, MVS, and OS/390 are registered trademarks of International Business Machines. Java, J2EE, J2SE, J2ME, all Java-based marks, Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows and Windows NT are trademarks or registered trademarks of Microsoft Corporation. SAS/C is a registered trademark of SAS Institute Inc. UNIX is a registered trademark of The Open Group. All other trademarks and registered trademarks are the property of their respective owners. For additional information, contact: Datalogics, Incorporated 101 North Wacker Drive, Suite 1800 Chicago, Illinois 60606-7301 Phone: 312-853-8200 Fax: 312-853-8282 www.datalogics.com [email protected] Table of Contents 1 About This Guide 1.1 The Purpose of this Guide 1.2 Introduction 1.2 What You Should Know 1.3 How This Book is Organized 1.4 Document Conventions 1.4 Related Documentation 1.5 What’s New in This Release 1.7 What’s New in Previous Releases 1.9 2 Design Overview 2.1 Introduction 2.2 Adobe PDF Library Version Control 2.2 Adobe PDF Library for OS/390, OS/400 and Selected UNIX Platforms 2.4 Creating a PDF Document 2.6 How the PDF Library Operates 2.7 Enhancements for the OS/390 and OS/400 Environments 2.11 Assembler Interface 2.11 Platform-Specific Concerns 2.13 3 Structure Reference Interface Structure Summary 3.1 3.2 Job Interface 3.2 Document Interface 3.3 Page Interface 3.7 Graphical Interface 3.9 4 PDF Functionality 4.1 OS/390 Examples 4.2 PDF Document Options 4.4 Compatibility Between PDF Documents 4.7 Compatibility with External Applications 4.8 Optimizing Performance 4.9 Print Issues 4.10 5 Data Conversion Translations 5.1 5.2 TOC.2 Adobe PDF Library Developer Overview 1 About This Guide This chapter explains the scope and content of this guide and provides developers with an introduction to the Adobe PDF Library. Experienced users may want to skip directly to the section “What’s New in This Release” on page 1.7 for information on the latest enhancements and additions. 1.1 1.2 A dobe P DF Lib ra r y De v el ope r Ov er v ie w The Purpose of this Guide This guide is designed to aid developers with incorporating the API calls for the Adobe PDF Library and DLI into their composition application. NOTE: Datalogics ports the Adobe PDF Library and DLI to the OS/390, OS/400 and certain additional UNIX platforms, but also supports the products on the original Adobe-selected Windows, Macintosh and UNIX platforms. Except where noted, information here applies to all platforms, not just Datalogics ports. Introduction The Adobe PDF Library is a collection of object-oriented routines offering an Application Programming Interface (API) which enables your composition application to produce PostScript (PS) or Portable Document Format (PDF) Page Description Language (PDL) files. PDF files are compact, deviceindependent files providing efficient electronic distribution of large documents for either viewing with Adobe Acrobat, Adobe Reader and similar PDF viewers, or output to high-speed printers. As with the original Adobe release on Windows, Macintosh and certain UNIX platforms, the Datalogicssupplied Adobe PDF Library and DLI allow you to control the manipulation of PDF files. The Library can link to your composition application to manipulate and produce PDF files, and can also add private data to the PDF output. NOTE: The Adobe PDF Library cannot be used to display PDF files on the OS/390 and OS/400 operating systems. From the User’s Viewpoint Adobe PDF Library does not require a Graphical User Interface, and applications built with it do not require Adobe Acrobat or Adobe Reader viewers for operation. (You can also create a Windows viewing application using the Adobe PDF Library. Ask your Sales or Support representative for a copy of the DLViewer sample application.) The interface to the Adobe PDF Library is in terms of calls to library-defined entries (the API). The Adobe PDF Library appears as a collection of objects and their applicabls methods, in three categories: 1 Objects that persist, either in memory or in the object store, until a document is completed and freed 2 A collection of objects used to communicate with the Adobe PDF Library which has a short life span and is terminated by a call into the Adobe PDF Library 3 A Library Control Object, which is the PDFLDataRec structure created before initializing the Adobe About This Guide PDF Library, populated before and during the Adobe PDF Library initialization, and cleared and released during and after the Adobe PDF Library termination What You Should Know This book is not for developers new to application programming interfaces; therefore, it does not describe programming concepts and techniques. The following list describes the level of experience or knowledge required to understand this book: • Familiarity with high-level language Application Programming Interfaces (APIs), programming on the relevant operating system with the corresponding development tools for that platform, and the process of writing applications in general • A general understanding of the structure and contents of PDF files; the PostScript language; font management (the complete set of characters of a particular design) including the style, arrangement, and appearance of typeset matter for print or electronic display; and composition processes in the platform environment Your application should be capable of the following, where appropriate: • Applications which create PDF should be capable of providing the x and y page coordinates for each object to be placed on the page. The default basis of the coordinates are based on the "first quadrant" values, where (0,0) is the lower left corner. Applications which consume or modify PDF will be supplied the x and y page coordinates for each component addressed. • Applications generating textual objects must be capable of specifying font and point size information for those objects. i.e. The Adobe PDF Library itself is not a composition engine; its purpose is to produce Adobe PDF-compliant code as directed by the application, according to the typesetting information provided by that application. You should have access to the Adobe PDF Library Applications Programming Interface (API) manual, related Datalogics Interface documentation, and the Adobe PDF Specifications manual for your system. You should find these documents provided within your release, accessible via the 1.3 1.4 A dobe P DF Lib ra r y De v el ope r Ov er v ie w referencelibrary.pdf document using the copy of Adobe Reader provided (or any other PDF viewer utility). For Adobe PDF Library v6.x releases, Adobe PDF Specification 1.5 is appropriate. For Adobe PDF Library v7.x releases, Adobe PDF Specification 1.6 is appropriate. NOTE: Some structures permitted in Adobe PDF Specification 1.6 may not be permitted in Adobe PDF Specification 1.5, and some structures defined in Adobe PDF Specification 1.5 are not available in Adobe PDF Specification 1.4. The explanations, assumptions and samples provided in this guide refer to Adobe PDF Library v7.0.5 and DLI v7.0.5 or higher. How This Book is Organized The following list provides an outline of the chapters as well as a brief description of their contents. Click on each Chapter title below to jump to its first page. Chapter 1: "About This Guide" (This chapter) outlines the chapters to follow, explains the document conventions used here, and lists other related documentation which you may find useful for your work. Chapter 2: "Design Overview" introduces the design of the components of the Adobe PDF Library for various platforms, explains the function of the Adobe PDF Library, and summarizes concepts and the most common objects and methods used. Chapter 3: "Structure Reference" lists the functional specifications for each interface. These object and method names are described in-depth in the Acrobat Core API On-line Reference document. Chapter 4: "PDF Functionality" identifies some Adobe PDF Library functionality, and provides sample code to illustrate usage. In addition, discussions on optimizing performance and print issues are included. Chapter 5: "Data Conversion" explains the translation process from EBCDIC to ASCII. Document Conventions The terms note, link and bookmark are used in this book the same way they are in the user interface of Adobe PDF Library v7.0®, Adobe Acrobat® and Adobe Reader®. These correspond to the text About This Guide annotation, link annotation and routine entry structures (respectively) that appear in a PDF file. See the Portable Document Format Reference Manual for a description of the PDF file format. The following documentation conventions appear throughout the manual to help you differentiate regular text from product and program names, and to distinguish command syntax. • Product and program names are set in italic type. • Multi-line examples are separated from the text and set in Courier monospace • Directory names and filenames are contained within the text and set in Courier monospace. • Commands are contained within the text and set in Courier monospace. • New terms are italicized. • Page numbers in this book do not correspond to page numbers in the PDF file. The numbering scheme (e.g. 4.1 or A.10) indicates the chapter number (4) or appendix letter (A) first, followed by the page number (1 or 10), separated by a period. Related Documentation The following documents will be useful in developing applications using DLI. Datalogics Resources Adobe PDF Library and DLI Installation Guide This document describes the installation requirements for using the Adobe PDF Library and DLI on the various platforms to which Datalogics has ported these products. Adobe PDF Library Developer Overview (This book) This document is designed to aid developers with incorporating the API calls for the Adobe PDF Library into their composition application. DLI Implementation and Reference Guide This document details the Datalogics Interface, a simplified interface to the COS Layer of the Adobe PDF Library. Java Interface User Guide This document details the Datalogics Java Interface, a Java-language wrapper interface to the Adobe PDF Library and DLI. Adobe Resources The following documents are distributed by Adobe as part of the original Adobe PDF Library release, and are redistributed by Datalogics without alteration. These and other documents may also be found on the 1.5 1.6 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Adobe website at http://partners.adobe.com/asn/acrobat/technotes.jsp. (Descriptions below are provided by Adobe as part of their original accompanying readme.txt file.) NOTE: Adobe Solutions Network (ASN) membership may be required in order to access some material on the Adobe website. See http:// partners.adobe.com/asn/programs/developer/index.jsp for more details. Portable Document Format Reference Manual This document describes PDF Standard 1.6 specifications. The latest version may be found at http://partners.adobe.com/public/ developer/pdf/index_reference.html. NOTE: Adobe also provides an accompanying errata file for this manual, with lastminute updates and corrections. One copy is provided with this documentation (see your documentation file folder), and you can check for newer copies at http://partners.adobe.com/public/developer/en/pdf/ PDF16Errata.pdf Adobe PDF Library Overview This guide provides background and development information for the Adobe PDF Library. Read this document before beginning development for information such as supported platforms, known issues and development requirements. Acrobat and PDF Library API Overview This guide provides an overview of the Acrobat API in general. It covers information applicable to both Plug-in development and PDF Library development. Read this document to obtain an understanding of how the Acrobat API is organized. Acrobat and PDF Library API Reference This is the reference manual for all of the Acrobat API methods made available to the Acrobat Viewer and Adobe PDF Library. It documents the parameters, return values and availability of each method, as well as specific implementation notes. This document is useful while developing with the Adobe PDF Library or planning development to determine method availability and capabilities. NOTE: As of the v7.x release series, the former "PDF Library Supplement to the Acrobat Core API" (Technical Note #5414) is now combined with this manual, as a separate section near the back of the main book. 3D Annotations Tutorial As its Introduction explains, "This tutorial describes how to programmatically create 3D annotations in PDF files. The code [...] can be used as part of an application developed using the Adobe PDF Library." SnippetRunner Cookbook This documents the useful SnippetRunner sample application development tool, provided by Adobe as a means of rapidly developing new functions using Library methods, and testing them within the context of a working Library application. About This Guide What’s New in This Release This section contains highlights of new additions and enhancements to this guide. The Developer Guide is intended more to address programming practices than Library operation, and thus the What’s New section of this guide highlights changes to the Developer Guide itself, rather than changes and enhancements to the Adobe PDF Library or DLI releases. (For new Adobe PDF Library features, see the Adobe PDF Library Overview. For new DLI features, see the DLI Implementation and Reference Guide.) You should also check the accompanying Release Notes file (typically ReleaseNotes.pdf) and readme.txt files (one each accompanies the software release files and the documentation files in their respective folders or directories) if present. Release Notes contain fixes and enhancements usually resulting from past problem reports; the readme.txt files typically contain last-minute information on the current release of the software or the documentation files. Minor version upgrades may be made as running changes rather than full releases, so the version or subversion number of your release may be newer than those listed here. See the accompanying readme.txt file for the very latest changes and enhancements. v7.0.5 New PDWordGetCharPoint Method A new PDWordGetCharPoint method has been created, accepting a PDWord, a byte index into the word, and a pointer to an ASPoint structure to fill. This was previously in development for certain customer sites as "PDWordGetNthCharPoint," but it was decided that the name of PDWordGetCharPoint was more in keeping with standard Adobe method-naming practices. This is intended to enhance the WordFinder feature in Adobe PDF Library, to provide additional information about the typesetting characteristics of the text that forms a word, specifically the baseline shift used for super-scripts and sub-scripts. For example, if someone is describing a temperature measurement of "78°" (78 degrees) using a superscript ‘o’ (LATIN SMALL LETTER O), the current implementation of WordFinder returns "78o," 1.7 1.8 A dobe P DF Lib ra r y De v el ope r Ov er v ie w which may be confused with a mangled or mistyped 780 (seven hundred eighty), with no way to understand the semantic meaning of this character sequence. The new method PDWordGetCharPoint will accept as input a PDWord object and an index specifying the character index into the PDWord (the first character has an index of 0), and return an ASPoint address pointer: PDWordGetCharPoint (pdWord, Index, &Point); This returned point represents the position at which the specified character is placed, taking into account the current transform matrix, the current text matrix, text rise, and other parameters affecting the character’s position. The values returned from PDWordGetCharPoint may be used to calculate the vector representing the PDWord as follows: 1 Obtain each character’s font size. 2 Calculate the font size used most often, and collect the list of characters with this font size. If there is no majority font size, use the largest font size in the PDWord. 3 The baseline is a vector extending from the placement position of the first character in the list created in step 2, through the placement position of the last character. Superscript and subscript characters will have placement positions which do not lie on the calculated baseline vector. Other normally placed characters will lie on the baseline, unless the word is not typeset in a straight line, but that situation is currently not supported by this method. For a PDWord where there is no font size used most often, the baseline may be calculated as the vector extending from the placement position of the largest letter, through the placement position of the largest character in the next word on the line, and defining the baseline of the last word on a line to be a vector with the same direction as the baseline of the previous word on the line. About This Guide What’s New in Previous Releases A summary of enhancements in prior releases follows below. v7.0.1 PDF Functionality The former "Samples" chapter was retitled as "PDF Functionality," and a new section on print issues was added . 1.9 1.10 A dobe P DF Lib ra r y De v el ope r Ov er v ie w 2 Design Overview This chapter introduces the design of the components of the Adobe PDF Library. It explains the function of the Library and summarizes the concepts and the most common objects and methods used. 2.1 2.2 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Introduction Adobe has developed a library of software routines to support the creation and maintenance of PDF files. The Adobe PDF Library consists of several hundred object modules and more than 250,000 lines of code. The Library allows programmers to develop applications that directly create PDF output, without having to go through a distillation process after the application has completed execution. The Adobe PDF Library can produce either PDF files or PostScript files. PDF files generated by the Adobe PDF Library appear the same as if the file had been printed from the original application’s print stream. Furthermore, a PostScript print-out of a page appears identical to the generated PDF of the same page. The Library has a well-defined Application Program Interface (API), and can be used directly in several processing environments, including Windows, Mac, OS/390, OS/400, and numerous UNIX platforms. There have always been three ways to create PDF files: • Distillation (via Adobe Acrobat Distiller or Adobe Normalizer Server) • conversion • proprietary output modules or emitters The Adobe PDF Library offers a fourth option: native generation of true Adobe PDF from within your own applications. Like those generated by Adobe Acrobat Distiller or Adobe Normalizer Server, these files are created by the Library and are fully-functional, capable of supporting the full range of features. As in proprietary modules, PDF generation takes place within the application with access to the whole data set and does not require additional post-production steps. The Library is maintained by Adobe and Datalogics, and is always current with the PDF standard supporting the latest enhancements. The Library also provides facilities to read existing PDF documents, navigate their content, and extract or modify portions of the document. The routines needed to combine documents, modify privileges, compress or linearize non-compressed documents, or convert existing documents to PostScript are all present in the Library. Adobe PDF Library Version Control The Adobe PDF Library is a set of routines associated with other Adobe products such as Adobe Acrobat, Adobe Acrobat Distiller and Adobe Reader. The original Adobe PDF Library was labeled as version 1.2 in order to maintain consistency with the PDF standard of 1.2. The next version of the Library was labeled version 4.0 (and subsequently 4.05) due to the tie-in with Acrobat v4.0. Versions 4.0/4.05 of the Library complied with the PDF standard of 1.3. Adobe PDF Library v5.0.2Plus complied with PDF Design Overview Standard 1.4, and Adobe PDF Library v6.x complied with PDF Standard 1.5. The current Adobe PDF Library v7.x now complies with PDF Standard 1.6. In addition, releases of Adobe PDF Library from v6.1.0 onwards are thread-safe, and thus can be run in multi-threaded applications without the need for separate MutEx coding. PDF Level Declarations in Output By default, the Adobe PDF Library declares the current PDF level compliance in output PDF files ; e.g. Adobe PDF Library v7.x applications building pages without Datalogics Interface methods will generate PDF v1.6. Further description below applies only to PDF output generated via DLI methods. NOTE: Viewing a latest-generation PDF in a prior-version Adobe Acrobat or Adobe Reader (e.g. viewing PDF v1.6 in Acrobat or Reader v6.0 or earlier) can produce a popup warning of a PDF level mismatch, to warn the user that certain PDF features new to that version may not be supported by the viewing program. While this is only a warning, it may concern some users who have not upgraded to the latest viewer. You should ensure that your document features are backwards-compatible to older viewer versions where possible, or warn your users that a viewer upgrade will be required in order to take full advantage of features in your document, as appropriate. PDF Level Declarations via DLI Applications built with Adobe PDF Library and Datalogics Interface (e.g. Adobe PDF Library v7.0.1Plus and DLI v7.0.1) and using DLI methods for output will have their output PDF compliance set appropriately by DLI. By default, DLI-generated files will identify themselves as PDF v1.3 compliant, or higher values if appropriate, based on the functionality embedded in the document. Please consult Chapter 2 of the DLI Implementation and Reference Guide for more details on Adobe PDF Library PDF version control. 2.3 2.4 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Adobe PDF Library for OS/390, OS/400 and Selected UNIX Platforms Adobe has contracted with Datalogics to port the Adobe PDF Library to the IBM MVS (Multiple Virtual System or OS/390), IBM AS/400 (or OS/400), and certain UNIX environments in addition to those for which Adobe already provides their own build. Datalogics will distribute the product in these environments, as well as offering support for all of the available platform environments. Adobe PDF Library for OS/390 and OS/400 contains the same modules that are available in the other processing environments, and produces identical PDF files from the same API calls and input data stream. However, the routines which create and maintain a graphical user interface have not been ported to the OS/390 or OS/400 environments. Adobe PDF Library for OS/390 and OS/400 can be used as a collection of modules accessed either directly through the standard API (for OS/390 applications written in SAS/C) or indirectly through an interface component consisting of macros that simplify the calling sequence to the API (for OS/390 applications written in Assembler). For an example using macros, refer to DL.PDFLIB.Vxxx.COPY(@DLPDFD) which contains the layout for all of the Assembler macros. NOTE: Throughout this guide, vxxx is used to denote the version number of the product, such as Adobe PDF Library v6.1.1Plus. This formula applies not only to the product itself, but also to any generated files or directories which are associated with and contain that version number. The function names in this PDS member match the names of the documented Adobe PDF Library and DLI API calls, except for their case—the macros are upper-case to support certain Assembler debuggers which do not properly handle lower-case characters. The following code samples demonstrate the relationship between Assembler macros and the C API calls. *********************************************** Assembler: @DLPDFC * DLFUNC=PDFLINIT, * DLRETURN=RTN, * DLPARMS=(ALIBREC) *********************************************** C: rtn = PDFLInit(&LibRec); The above example demonstrates the use of function, return value and parameter list from a "C" API call through the @DLPDFC Assembler macro. *********************************************** Design Overview Adobe PDF Library is written in C and C++. The Library is a collection of functions that perform specific tasks to create, import, merge, linearize, optimize, compress, and encrypt PDF files which conform to the latest PDF standard. An API (written entirely in C) is provided that exposes the Library’s high-level interface to applications. The Assembler interface (for OS/390) provides a set of macros that simplify calling the API functions into the Library. Developers using the Library have available complete examples of simple document creation that explain the use of the macros. These samples clearly demonstrate tested methods of using macros for basic API calls to the Library. Additionally, this guide describes each API function in the Library with a description of each call, returned values, and calling parameters and the order in which they are specified. The Assembler macros provide access to all functions exposed in the API via header files for C programs. The examples for both Assembler and C are located elsewhere in this manual; start with “OS/390 Examples” on page 4.2 for more details. NOTE: For clients who are using native UNIX compilers, the run-time libraries for each of these platforms are available from the Free Software Foundation. Any clients wishing to integrate the Adobe PDF Library and DLI components with their natively-compiled applications can retrieve these run-time libraries free of charge from the Free Software Foundation at http://www.gnu.org/software/ gcc/ Datalogics developed the Datalogics Interface (or DL Interface, or DLI) to enhance performance in creating PDF. It allows for the bypassing of the PDFEdit Layer and leads the application directly to the Carousel Object System (COS) Layer of the Library, thus minimizing the time needed to perform tasks. For more information about the DL Interface, refer to the DLI Implementation and Reference Guide. 2.5 2.6 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Creating a PDF Document The Adobe PDF Library is a beneficial tool for creating PostScript and PDF output for a variety of industries. The PDF documents can be created at the time of composition or by transforming legacy print streams. Therefore, customers have the option of creating PostScript or PDF from applications, or taking previously created output (except PostScript) and converting it to PDF. NOTE: The conversion of PostScript to PDF is not supported, since Adobe Distiller and Adobe Normalizer Server are available for this purpose. Further information on Adobe Normalizer Server is available from the Datalogics website at http:// www.datalogics.com/pdflibrary-normalizer.asp. Once the PDF has been created, it can be modified using Acrobat or other plug-ins since it is true Adobe PDF. Additional customer applications may be required to transform legacy print streams. To create a simple PDF document using the Adobe PDF Library, follow these basic composition steps: 1 For non-OS/390 systems, allocate memory for, and specify the location of, the application-specific font resource 2 Initialize the Library 3 Create a document 4 Create the content object for a page 5 Create text 6 Add text to the page 7 Release the text 8 Include another PDF document as a graphic on the page (optional) 9 Include an image in the page (optional) 10 Add the page to the document 11 Release the content object for the page 12 Repeat as needed 13 Output the document 14 Release the document 15 Terminate the Library 16 Release any allocated memory PDF files can be created at the complete output stream level, user-defined document level, or any combination. For example, in a statement application, one PDF file could be created for the whole run, or one PDF file could be created for each statement. Using the concept of PDF threads, it is also possible for users to create multiple output sorts from the same application execution, and to apply different file definitions to each output. This could be used, for instance, to create one PDF file sorted for printed output and mail delivery and another sorted by sales territory and destined for online access. Design Overview How the PDF Library Operates The Adobe PDF Library facilitates the generation of output files in PostScript and PDF page description languages by making calls to the Library. The following diagram illustrates the data flow of this process: Adobe PDF Library Data Flow Composition Application Data and Instructions Callbacks Source Data Document Cache Document Files Adobe PDF Library PDF PostScript D e s i g n o f t h e Adobe PDF Library The Adobe PDF Library is a high-level C-language library containing an object-oriented collection of routines which construct a reflection of a set of pages and their content. This reflection is independent of any schema for the creation or display of pages. The Library contains objects which reflect collections of information (pages, areas, strings and images) and presentations of text (fonts, colors, sizes, rules, etc.). At present, the Adobe PDF Library for OS/390 will be supplied as a dynamically linked component of the application, and OS/400 will be dynamically linked service programs. 2.7 A dobe P DF Lib ra r y De v el ope r Ov er v ie w The API organizes document components into objects that applications can manipulate. These objects fall into two categories: • Objects that persist within the Library, either in memory or in the object store, until a document is completed and freed. • Objects used to communicate with the Library. These objects are allocated in and may be directly manipulated by the application. The following is a typical diagram of a composition application. Composition Application Composition Application Control Layer Data Manipulation Data Input Output Generation Marks, Text and Attributes Adobe PDF Library Callback Area Callback Information Source Data Instructions 2.8 Design Overview Structure of a Call The following diagram illustrates call structures within an application: Structure of a Call Composition Application Initialize Library Start Document Process Each Page by Line Segment Write Document Terminate Library W h a t t h e Adobe PDF Library G e n e r a t e s The Library may produce a set of pages in PDF or PostScript which reflects the state of the defined objects as the constituents of a document. A single document may be output in many forms, such as Linearized PDF, normal PDF, and PostScript from a single creation of a document. The routines that create, destroy, access, or modify objects within the Library are defined in C language .h files which define the exposed API to the Adobe PDF Library. 2.9 2.10 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Summary of the Most Common Objects The Library supplies not only the ability to create PDF and PostScript, but also the ability to reorganize and reorder pages without recomposition. The following table summarizes the most common objects used in creating applications. Use this as a roadmap to define the functionality you want your application to have. Table 2-1: Most Common Objects in Applications Object Description Document This object creates or reads a document as input and writes either PDF (which may be read again) or PostScript (which may not be re-read) as output. Page This term denotes a single side of a single sheet of media. Among its properties, a page may be moved from one document to another or re-sequenced within a document, and annotations may be added or removed. Container Provides structural grouping Graphical Variety of objects which make a mark on the page, such as Text, Lines, Backgrounds and Pictures Path Mechanism for creating arbitrary line drawings with PDF Image Renders graphics for PDF Form An arbitrarily complex collection of other graphic operators which may be positioned and scaled freely Bookmark A collection of bookmark operators acts like a Table of Contents with live links in electronic display Thread Within a document, collects non-contiguous elements into a stream Font Describes a specific font and encoding needed to image text Thumbnail A low-resolution bitmap of a page used to identify the contents of a page Design Overview Enhancements for the OS/390 and OS/400 Environments The following table lists the code enhancements required for the Adobe PDF Library to run in the OS/390 and OS/400 environments. The functions were added by Datalogics as a convenience to facilitate use of the Library. Table 2-2: Code Enhancements for OS/390 and OS/400 Environment Enhancement Description dl_etoa EBCDIC to ASCII conversion routine dl_atoe ASCII to EBCDIC conversion routine DLRpt C routine which accepts a pointer to a NULL-terminated string as input. Used to report error conditions and as a debugging aid. Requires an output DD DLOUT, which is opened for append, written to, and closed at each call to DLRpt. IBM LibASCII Used by Library on OS/400 Customers may choose to use their own routines in place of the ones listed above. Assembler Interface Datalogics has provided several Assembler macros and copy members to facilitate the use of the Adobe PDF Library for OS/390 from an Assembler application. The EQDL file, which contains equates, should be included (via the COPY statement) into the Assembler source file which will use the Adobe PDF Library. In addition, a @DLCONS1 file containing constants required by the Library should be included. Datalogics has taken the approach of maintaining consistency between the names of functions and structures in the Assembler environment and the values defined in the Adobe documentation for the Adobe PDF Library. This is intended to simplify the process of learning the Library interface. This should 2.11 2.12 A dobe P DF Lib ra r y De v el ope r Ov er v ie w also facilitate transitioning between multiple applications which make use of the Adobe PDF Library, but may be written in different languages. NOTE: To accommodate some Assembler debugging tools which do not properly handle lower-case characters, all function and structure names have been converted to uppercase. To accomplish this, Datalogics includes the following macros for use by the OS/390 Assembler programmer: Table 2-3: Assembler Macros Macro Description @DLPDFC Used to invoke the C functions which comprise the Adobe PDF Library @DLPDFS Used for structure allocation/DSECT definition (controlled by the user through the DSECT= option.) @DLPDFP Parameter block definition macro. Generates the C function parameter list definition used by @DLPDFC, and is called once within the application which invokes the Adobe PDF Library. @DLPDFD Contains the definitions for the Adobe PDF Library C functions and parameter lists. It is called from @DLPDFC once when @DLPDFC is first invoked. @DLPDFX Called from @DLPDFD for each defined C function. Datalogics generates the equates and macros programmatically, based on functions exposed in the SDK for the Adobe PDF Library. This allows user applications to include the latest versions of the Adobe PDF Library by reassembling with the most recent versions of the macros and copy members. The macros include parameter count and type checking, which will simplify the integration effort if the calling sequence for a given function in the Adobe PDF Library ever changes. Design Overview Platform-Specific Concerns Windows • The Adobe v7.0 build of the Windows (Win32) installation by default places its LIB and DLL files under the \Program Files file tree, in \Libs subfolders. When unpacking the accompanying DLI release, you should ensure that it is unpacked in the same top-level folder as the APDFL release, and thus creates its own, similar, parallel tree structure below (e.g. similar \Include and \Libs folders). • In comparison, the Datalogics v7.0Plus build of the Windows (Win32) installation by default places its LIB and DLL files under the \Datalogics file tree, in \Libs subfolders for both Adobe PDF Library and DLI. You should ensure that the latest versions of your release are not accidentally superceded by any previous, older versions which may reside elsewhere on your machine (e.g. under C:\Windows\System32) which may result in a PATH value inadvertently pointing to older files at build time or run time. File locations or PATH definitions should be reviewed to ensure that your application is going to locate the correct library files at build time, and locate the correct DLL files at run time. OS/400 On OS/400, the Adobe PDF Library may be active in only one process per activation group at any point in time. Failure to observe this restriction can result in unexpected and inconsistent failures within Adobe PDF Library modules. This restriction is caused by the same internal constraints which (in versions prior to Adobe PDF Library v6.1.1) prevented the library from being thread-safe. The Adobe PDF Library service programs are built to use the *CALLER activation group setting. Datalogics recommends that all applications invoking the Adobe PDF Library be built with the default *NEW activation group setting. 2.13 2.14 A dobe P DF Lib ra r y De v el ope r Ov er v ie w 3 Structure Reference The functional specifications for each interface are listed in this chapter. These object and method names are described in-depth in the Acrobat Core API On-line Reference manual. 3.1 3.2 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Interface Structure Summary The tables in this chapter briefly describe the high-level functional specifications for each interface, listed in sequential order. The following elements do not constitute the whole of the user interface, simply the subset most often used to create documents. The complete set of methods are documented in the Acrobat Core API On-line Reference Technical Note. The interface routines of the Adobe PDF Library are: • Job Interface: Routines which initialize, terminate and create a work area for the Library • Document Interface: Routines which may contain the following properties: Document File, Thread, Bookmark, Page Mode, Font List and Document Information • Page Interface: Routines which create, access or destroy pages of data • Graphical Interface: Routines which create and manipulate text, lines, backgrounds, and pictures on a page Job Interface The structure begins with job interface routines. The first function builds a definition of the work area for use by the Adobe PDF Library. This is a single data structure named PDFLDataRec, which must be created in the user space prior to initializing the Library. For the duration of the usage of the Library, it must persist in memory. If local memory allocation routines are to be used in place of the built-in routines, pointers to those routines must be placed into the record. Table 3-4: Job Interface—Methods Order Method Description 1 PDFLDataRec(Client, Size) A single data structure which defines the work area to be used by the Library. 2 AsMemAllocProc(Client, Size) When used, this call instructs the Library to use the named client rather than alloc() to obtain access to memory. 3 AsMemReallocProc(Client, Size) When used, this call instructs the Library to use the named client rather than realloc() to increase the size of a memory block. 4 AsMemFreeProc(Client, Size) When used, this call instructs the Library to use the named client rather than free() to return memory to the pool. 5 AsMemAvailProc(Client, Size) When used, this call instructs the Library to use the named client rather than avail() to test for the availability of memory. Structure Reference Order Method Description 6 PDFLInit(PDFLData*) This call initializes the Library for usage. The PDFLData declaration refers to an area of memory which the Library may use as a work area. NOTE: There must be exactly one of these calls prior to any other call to the Library. 7 PDFLTerm() Terminates the Library and frees all resources used by the Library. NOTE: There may be no other calls to the Library following this call. 8 PDFLGetVersion() Test for compatibility of the version of the Library used when the executable was created, including the version used at run time. Results indicate that it is either the same version, a previous version, or a later version. Document Interface The Adobe PDF Library supplies the ability to create and manipulate one or more document objects. The Library may have any number of documents active at any given time. These documents may either be read into the Library from an external source, or have been created entirely within the Library. These document-interface routines follow the job interface. A document has a number of properties which describe: • how the document is to be written to an external form: either PDF, which may be read again, or PostScript, which may not be re-read (although it can be distilled back into PDF, or printed under certain conditions; see “Print Issues” on page 4.10 for more discussion on what you can do with PostScript output) • adding or removing pages in documents, or moving pages from one document to another • a number of related areas within the document to be identified and indexed, enabling rearrangement and output of these areas in a specified order • how the document should be seen in the initial view of a document browser (In addition, a hierarchical structure of the document can be created as a navigational device.) • permissions and document information obtained by flags which are carried in the page The following tables identify the high-level functional specifications of the most significant document interface routines. These tables are categorized by property: Document File, Thread, Bookmark, Page Mode, Font List and Document Information. 3.3 3.4 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Document File Table 3-5: Document Interface—Document File Order Method Description 1 PDDocCreate() Creates a new document, saving the resulting document handle. 2 PDDocOpen() Accesses an existing document by specifying a file name, a file system, a procedure to be used to test authorization (default is omitted), and a flag indicating if the document read is to be repaired if it appears damaged. 3 PDDocSave() Writes a document as a PDF by specifying a document object, save flags, a file name, and optionally a file system name, a procedure to function as a monitor (informing you of status as the save progresses) and a procedure to use as the monitor’s client. The save flag declarations are PDFSaveIncremental, PDSaveFull, PDSaveCopy and PDSaveLinearized. 4 PDFLPrintPDF() Prints a document to PostScript by specifying a document object, a file name and a print parameter object. Refer to the SDK file PrintLib.h for the values of the print parameter object. 5 PDFLToPs() Writes a document to PostScript by specifying document, path, and control. 6 PDDocRelease() Permits a document to be moved out of memory when no longer needed for active processing (but leaves it in the Library’s cache). 7 PDDocAcquire() Moves a document back into active memory. 8 PDDocClose() Removes a document from cache and memory when there is no further use for it. 9 PDEnumDocs() For each currently existing document, calls the procedure named as its only parameter once for each document defined to the Library. The Library may have any number of active documents, either created or read. Pages may be added, removed, or moved from one document to another. Structure Reference Threads and Beads A thread is a series of objects in a given order, usually used to indicate a set of text areas to be read in a given order. Some PDF viewers will support limited reformatting of threads to make reading on a display screen easier. A document may have any number of threads which are maintained as a list. A bead is a rectangular area of the page, regardless of what is contained in that area. There are a number of additional functions which set or access attributes of threads and beads. The Library contains no concept of "groups of pages," although pages may be strung together into a series using the thread concepts. The thread object is a means of collecting disparate elements within a document into a stream. It is used in some PDF viewers to locate the "next" thing to be seen. The collection of threads is a property of the document object. Table 3-6: Document Interface—Thread Order Method Description 1 PDDocGetThreadIndex() Returns the Thread Index of the specified document. 2 PDDocGetThread() Returns the specified (integer) thread of the specified document. 3 PDDocNumThreads() Returns an integer count of threads within the specified document. 4 PDDocAddThread() Adds a thread to a document. 5 PDDocRemoveThread() Removes a thread from a document. 6 PDThreadNew() Creates a new thread. 7 PDThreadDestroy() Deletes a thread from a document. 8 PDBeadInsert() Inserts a bead after a specified bead. 3.5 3.6 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Bookmark The bookmark object is a collection of bookmark operators. Similar to a Table of Contents, bookmarks aid navigation by acting as live links within the displayed document. Bookmarks may be hierarchically nested; the hierarchy is presented as varying levels of indent when the bookmarks are displayed. Table 3-7: Document Interface—Bookmark Order Method Description 1 PDDocGetBookmarkRoot() Obtains the root node for the bookmarks of a given document. Page Mode The initial page viewing mode can be set to display one or a combination of the following: • bookmarks • thumbnails • document Table 3-8: Document Interface—Page Mode Order Method Description 1 PDDocSetPageMode() Sets the value of the enumerated data type value of the PDPageMode. The mode choices are PDDontCare, PDUseNone, PDUseThumbs, PDUseBookmarks and PDFullScreen. 2 PDDocGetPageMode() Gets the value of the PDPageMode key. Font List The font object describes a specific font and encoding to be used to image text. Table 3-9: Document Interface—Font List Order Method Description 1 PDDocEnumFonts() Calls the specified procedure once for each font occurring in the specified document. Structure Reference Document Information Information about the document’s file and its state are set by flags. The bit field, composed of the values of the flags, returns the requested information. Table 3-10: Document Interface—Document Information Order Method Description 1 PDDocSetFlags() Sets information about the document’s file and its state. The enumerated data type flags specify various file status attributes. These PDDocFlags declarations are PDDocRequiresFullSave, PDDocIsModified, PDDocDeleteOnClose, PDDocWasRepaired, PDDocNewMajorVersion, PDDocNewMinorVersion, PDDocOldVersion, PDDocSuppressErrors, PDDocIsEmbedded and PDDocIsLinearized. 2 PDDocGetFlags() Gets information about the document’s file and its state. 3 PDDocClearFlags() Clears flags associated with a document. 4 PDDocGetFile() Gets the ASfile for a document which was read or written. 5 PDDocGetID() Gets the unique PDF file ID reference of a document. 6 PDDocGetNumPages Gets the number of pages in a document. 7 PDDocGetVersion Gets the major and minor PDF document versions, which are specified in the header of a PDF file. Page Interface The page interface contains routines that create, access or destroy pages of data. Pages exist only in terms of the document which contains them. Pages can be created or acquired by page sequence numbers. They can be deleted or moved from one document to another. They may also be re-sequenced within a document. A page object reflects a single side of a single sheet of media (a physical page). It can be larger than the actual page image since it contains both the physical page and cropped (logical) page image size. A page 3.7 3.8 A dobe P DF Lib ra r y De v el ope r Ov er v ie w must have at least one container, to provide structural grouping, which is built when the page is constructed. The page’s scaling and location within presentation media are defined by the page’s matrix when it is created or accessed. Annotations can be created for viewing or removed from view. The electronic view of a page can show one of three areas: • the rectangular area that encloses all text, graphics, and images on the page • the logical page which is defined by crop marks • the physical page NOTE: The Adobe PDF Library requires the x and y coordinates of the elements it adds to a page from the driver application. Table 3-11: Page Interface—Methods Order Method Description 1 PDDocCreatePage() Creates and acquires a new page. The page is inserted into the document at a specified location. 2 PDDocAcquirePage() Increments the page’s reference count. 3 PDDocDeletePage() Deletes the specified pages, inclusively. 4 PDDocInsertPages() Inserts pages from one document into another document, including anything associated with the page, such as annotations. 5 PDDocReplacePages() Replaces a specified range of pages in one document with pages from another. 6 PDDocMovePage() Re-sequences pages within a document. 7 PDPageGetBBox() Gets the bounding box for a page. A bounding box is the rectangle that encloses all text, graphics, and images on the page. 8 PDPageSetCropBox() Sets the crop box for a page. A crop box is the region of the page that is displayed and printed. 9 PDPageGetCropBox() Gets the crop box for a page. 10 PDPageSetMediaBox Sets the media box for a page. The media box is the dimensions of the physical page. 11 PDPageGetMediaBox() Gets the media box for a page. 12 PDPageAddAnnot() Adds an annotation at the specified location in a page’s annotation array. 13 PDPageAddNewAnnot() Adds an annotation to the page. 14 PDPageGetAnnot() Gets a specific annotation on a page. Structure Reference Order Method Description 15 PDPageGetAnnotIndex() Gets the index of a given annotation object on a specified page. 16 PDPageRemoveAnnot() Removes an annotation from the specified page. 17 PDPageGetDefaultMatrix() Gets the matrix that transforms user space coordinates to rotated and cropped coordinates. 18 PDPageGetFlippedMatrix() Gets the matrix that transforms user space coordinates to rotated and cropped coordinates. 19 PDPageAquirePDEContent() Creates a PDEContent from the PDPage’s contents and resources. 20 PDPageEnumContent() Enumerates the contents of a page, calling a procedure for each drawing object in the page description. 21 PDPageGetNumber() Gets the page number for a specified page. 22 PDPageGetDoc() Gets the document that contains a specified page. Graphical Interface A container for a page will automatically be created when a page is created. Container objects are required to not only hold content but also provide structural grouping. Graphical objects (objects which make a mark on the page) are placed into a container after a clip is set for the object. These objects are: • Text • Path (lines) • Image • Form Generally, their specification requires a graphic state, matrix, font and text state, which are set and modified by the following functions: 3.9 3.10 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Table 3-12: Graphical Interface—Methods Order Method Description 1 PDEElementSetGState() Sets graphic state. 2 PDEElementMatrix() Sets location, rotation, and scaling. 3 PDEElementSetClipPath() Sets the clip for an element. 4 PDEContentAddElem() Inserts an element into a container. Text and Fonts The text methods are used to create, destroy, add, and access text and style information within the pages. They are also used to draw marks via drawing primitives. The principal object of text is a text run. A run of text must all be in the same font, use the same graphics state, and have identical inter-word, inter-character and inter-line spacing throughout the run. A run may be as short as a single character or as long as a column of text. It must have a font object, a graphic state object and a matrix object associated with it. It may have a text state object and a stroking matrix object associated with it. Text runs are contained within text objects (though it is acceptable to create an empty text object). Once created, a text run may be split into multiple runs. A text run may also be removed from its text object. The Adobe PDF Library supports the use of the Adobe Base 14 fonts without embedding them into the PDF file. The following additional fonts, however, may be embedded into PDF files: • Type 0 • Type 1 (including True Type) • Type 3 • Type 42 • Open Type (when available) Multi-master fonts may also be used, and as of the v7.x release, two multi-master fonts (one Serif and one Sans-serif) are distributed with the Library. Manipulation of fonts based on changing a variety of font metrics is also supported. If the Library cannot find the specified font, it will use the font metrics stored in the document to faux (fake) the font, allowing Adobe Reader to mimic the font. Structure Reference Table 3-13: Graphical Interface—Text Order Method Description of the Text structure 1 PDEFontCreate() Creates a font structure. 2 PDFindSysFont() Locates system font information for the specified font. 3 PDEFontCreateFromSysFont() Creates a font structure. 4 PDETextAdd() Adds a character or a text run to a PDEText object. 5 PDETextRunSetGState() Sets the graphics state of a text run. 6 PDETextRunSetTextState() Sets the text state of a text run. 7 PDETextRunSetFont() Sets the font of a text run. 8 PDETextRunSetTextMatrix() Sets the text matrix of a text run. 9 PDETextRunSetStrokeMatrix() Sets the stroke matrix of a text run. 10 PDETextRemove() Removes characters or text runs from a text object. 11 PDETextSplitRunAt() Splits a text run into multiple text runs. Path The path object is the mechanism used for creating arbitrary line drawings within PDF. These may be as simple as a line rule or as complex as a bar code. A path is first set and then an actual data path is added. Table 3-14: Graphical Interface—Path Order Method Description of Path structure 1 PDEPathCreate() Creates an empty path element. 2 PDEPathSetData() Sets new path data for a path element. 3 PDEPathAddSegment() Adds a segment to a path. 4 PDEElementSetGstate() Sets the graphics state information for an element. 5 PDEElementSetMatrix() Sets the transformation matrix for an element. 3.11 3.12 A dobe P DF Lib ra r y De v el ope r Ov er v ie w The data expected by the path object is an array of AsFixed values, each containing an operator or a parameter of that operator. The sequence is an operator, followed by the number of parameters that operator expects. All positions are relative to the media coordinates specified. The legal operators of the PDEPathElementType are: Table 3-15: Graphical Interface—Path:PDEPathElementType Operator Parameters Description kPDEMoveTo x1, y1 Moves the current point. kPDELineTo x1, y1 Appends a straight line segment from the current point. kPDECurveTo x1, y1, x2, y2, x3, y3 Appends a Bézier curve to the path. kPDECurveToV x1, y1, x2, y2 Appends a Bézier curve to the current path when the first control point coincides with the initial point on the curve. kPDECurveToY x1, y1, x2, y2 Appends a Bézier curve to the current path when the second control point coincides with the final point on the curve. kPDERect x1, y1, x2 (width), y2 (height) Adds a rectangle to the current path. kPDEClosePath (none) Closes the current path. NOTE: The Adobe PDF Library requires the x and y coordinates of the elements it adds to a page from the driver application. Image The image object is the graphic renderer for PDF. The Adobe PDF Library will accept the bit-stream form of graphics only. Users must have a tool to convert graphic formats to bit streams. PDEImageCreate creates an image object. Form The form object is an arbitrarily-complex collection of other graphic operators, which may be positioned and scaled freely. PDEFormCreate creates a new form from an existing COS object. 4 PDF Functionality This chapter identifies some examples of Adobe PDF Library functionality and provides sample code to illustrate usage. In addition, a discussion of the PDF documents themselves and how to further manipulate them is provided, along with tips on how to optimize performance. 4.1 4.2 A dobe P DF Lib ra r y De v el ope r Ov er v ie w OS/390 Examples Assembler Examples The Assembler language interface is discussed in Chapter 2: “Assembler Interface” on page 2.11. C Examples There are two C language examples below. The first, Hello World (or HWORLD@C) demonstrates the same functionality as the Assembler example: • Set fill/stroke color • Set line width • Define a rectangle • Demonstrate a stroked rectangle • Demonstrate a filled rectangle • Demonstrate the specification of a font • Demonstrate text insertion • Generate PDF output • Demonstrate the use of non-base 13 font resources • Import a bitmap graphic file • Demonstrate page rotation The second, Bill, demonstrates generation of a 10-page mock phone bill. Samples of Code The examples include comments which identify each function being illustrated. Refer to the Reference Library for machine-readable versions of Adobe PDF Library code samples. The following programming language examples are included in the distribution: • Assembler examples • C examples PDF Functionality Table 4-16: Sample Code Files on CD Language Example Description Assembler Error Checking The linked sample does not create any output. It merely demonstrates the calls which should be made to provide error checking, following calls to the Adobe PDF Library. If the Adobe PDF Library raises an exception, these calls will make that information available to the application. The error messages will be written to the DLIRPT file. Assembler Link Annotation via DLI This sample demonstrates the use of DLI to place a link annotation within a document. Assembler Embedded Graphics from Extracted Pages This sample demonstrates extracting a specific page from one PDF document to be used as an embedded graphic within another PDF document. Assembler Graphic Lists This sample demonstrates the use of the LoadGraphicList and dlpdfgetgraphicfromlist calls from Assembler. Assembler Basic File Open DLI This is a modification of HWORLD@A which utilizes DLI in conjunction with APDFL API calls. Assembler Basic File Open This is an Assembler Hello World sample, using APDFL API calls. Assembler Blank PDF Page This sample demonstrates the use of the APDFL API to create a one-page PDF file with nothing placed on the page. Assembler Place Text on Page This sample uses DLI to draw a rectangle and place text on a page. Assembler Text Annotation This sample demonstrates the generation of text annotations on a PDF page. Also demonstrated is the specification of whether an annotation should be open or closed when the document is opened. C Basic File Open This sample is the C language version of the Hello World sample, using APDFL API calls. C Bill Sample with SAS/C Link This sample is a mock phone bill comprised of three parts: BILL.C, UTILS.C and UTILS.H. All of the samples listed here are assumed to initialize and terminate the Adobe PDF Library. All 4.3 4.4 A dobe P DF Lib ra r y De v el ope r Ov er v ie w examples which generate PDF output also demonstrate the following steps which will not be explicitly stated below: • Create a document • Save a document • Create the page(s) for a document • Save the page(s) for a document All examples which insert text into a PDF document also demonstrate: • Define a text element • Define text runs • Add text runs to text element • Add text element to page content All examples which generate drawn graphic components also demonstrate: • Define a graphic state • Define a graphic element • Fill in path operations/parameters array • Set path paint options • Set path graphic state • Set path with operations array • Insert path element into page content PDF Document Options This section includes a list of available options and, where appropriate, tells how to perform them. For further information about each of these options, please consult the Adobe documentation included in the distribution. PDF Functionality PDF Output Languages The Adobe PDF Library supports the creation of PDF documents which contain single- and double-byte languages. These include: Single Byte • Arabic • Dutch • English • French • German • Hebrew • Italian • Brazilian Portuguese • South American Spanish • Spanish • Swedish Double Byte • Chinese • Japanese • Korean Bookmarking PDF Output Bookmarks can be inserted into the PDF document to facilitate navigation similar to a Table of Contents, with bookmarks appearing in a viewing pane to the left of the document in an Adobe Acrobat or Adobe Reader display. For information on including bookmarks in PDF documents, refer to “Bookmark” on page 3.6. The DLI sample application Annotations also demonstrates methods for creating bookmarks, hyperlinks and other annotations within your new PDF document. Linking PDF Output Hypertext links can be created in PDF output which link the following: • a specified location in the current PDF document • a specified location in another PDF document • another PDF document • a Uniform Resource Locator (URL) such as a web address 4.5 4.6 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Compress PDF Output PDF documents are not compressed by default. Linearized documents are always compressed. It is impossible to generate linearized output without compression. Linearize PDF Output Perform an OR of the PDSaveLinearized flag with any other flags (2nd parameter) in the call to PDDocSave. Optimize PDF Output Optimization is automatically applied as a part of linearization. Further optimization can be achieved by ORing the PDSaveCollectGarbage flag with any other flags (2nd parameter) in the call to PDDocSave. This will prevent unreferenced objects from being included in the PDF output file. Password-Protect PDF Output PDF output can be protected by setting a password at the time of the PDF creation. Two passwords can be defined: a User password, which must be given in order to view the document, and a Master password, which must be given in order to change protection settings and permissions on the document. Each password is optional; you can use one without the other. For example, a document that is to be freely readable but should not be modified will not need a User password, since anyone may open the document for viewing. However, to prevent modifications to the file, the proper Document Security settings must be made (e.g. No Changing, or No Content Copying or Extraction, etc.), and a Master password must be assigned, to ensure that no one can change the settings without proving ownership by first providing the necessary password. The DLI sample application Encrypt demonstrates how to define User and/or Master passwords when creating a new document, and the Adobe PDF Library sample application Decrypt demonstrates how to pass in the appropriate passwords for opening the document, or modifying its contents. Encrypt PDF Output Another form of document protection, encryption, is also an option for the PDF output. The DLI sample application Encrypt demonstrates how to generate output encryption, and the Adobe PDF Library sample application Decrypt demonstrates how to decrypt the output. PDF Functionality PDF and PostScript from One Composition PDF output is generated by calling PDDocSave for the specified filename PDDoc object. PostScript output can be generated as well (or instead) by specifying the same PDDoc object in a call to PDFLPrintPDF. NOTE: Output written to a file for later printing should be in PDF form, not PostScript. See “Print Issues” on page 4.10 for more information. Compatibility Between PDF Documents All PDF documents created using the Adobe PDF Library can be edited in Adobe Acrobat v4.0 or higher, and can be viewed in Adobe Reader v4.0 or higher. Applications can also be created which comply with v3.0 of Acrobat Exchange and Adobe Reader, if desired, because those functions have been maintained in the Library. When working with PDF documents, it can be necessary to combine information from one PDF document into another or to compare two PDF documents against each other to find the differences. These editing tasks can also be easily achieved using Adobe Acrobat. Document Compare A useful editing tool, the Compare Pages tool in Adobe Acrobat allows you to see how one document may differ from another. Using the Document->Compare Documents... pulldown menu option within Adobe Acrobat v7 (Adobe Acrobat v6 is similar), two PDF documents can be compared against each other to find inconsistencies. The resulting comparison will display a summary page listing the number of pages which differ, the number of pages which were added and whether any pages were rearranged. Each difference is illustrated in a side-by-side comparison. Document Content Sharing Some of the content of PDF documents created using the Adobe PDF Library can be shared. Text and tables can be copied and pasted between documents as necessary. The shared content can retain the formatting of its source document, or reformat and even restructure the data. Merging PDF Documents The PDF output can be merged in a number of ways: • Two or more PDF documents can be merged into one. • Parts of various PDF documents can be merged to form one new PDF document. 4.7 4.8 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Merging Two PDF Documents Two or more PDF documents can be combined to create a single new PDF document. Therefore, several documents from a variety of locations can be merged together and assembled into one document. Using Adobe Acrobat, this can be achieved using the Document > Insert Pages... pulldown menu option. An Adobe PDF Library application can also merge documents, either by adding pages from one to another, or by creating a new file and populating it with pages from one or more others. The Adobe PDF Library sample application MergePDF demonstrates how to merge two files by opening one document, adding the other to it, and saving the result as a new file. Merging Several PDF Documents into One Another means of merging PDF documents involves defining parameters which the Adobe PDF Library will use to search a set of PDF documents. The Library will find the defined section of the PDF (a page, a string of words, a graphic frame, etc.) and extract it and then place it into a new PDF document. The search can span a number of documents to create one new PDF document containing all of the extracted pieces. For example, this type of PDF document merge would be useful if you have a number of regional sales reports from which you need to create a composite summary report. Define the search criteria to find a section entitled "Summary." The search extracts this section from the first PDF document, places it into a new PDF document, and moves on to the next PDF document, until the summaries have been taken from all of the sales reports and put into the new PDF document. The summary sections in the new report will appear exactly as they were in the original reports. Compatibility with External Applications PDF documents created using the Adobe PDF Library can interact with a variety of external applications, which allows for greater control and ease of document creation. As previously noted, all PDF documents created using the Adobe PDF Library can be edited in Acrobat Exchange v4.0 or higher, and can be viewed in Reader v4.0 or higher. PDF documents can, therefore, be manipulated within Acrobat Exchange to include such features as password protection. Furthermore, a number of other Adobe and non-Adobe products have been tied in to enable a wide range of possibilities for manipulating the PDF documents. Adobe Catalog Using the Adobe Catalog feature of Adobe Acrobat, PDF documents created using the Adobe PDF Library can be indexed. Documentation for Adobe PDF Library and DLI provided by Datalogics includes APDFLIndex.pdx files (for Acrobat v6 users), allowing rapid searching and retrieval of words and phrases within the documentation suite. PDF Functionality Adobe PhotoShop and Adobe Illustrator PDF documents that were created using the Adobe PDF Library and which contain a graphic element have automatic ties to the Adobe graphics packages Adobe Photoshop and Adobe Illustrator. For instance, double-clicking on a graphic image in the PDF document can automatically launch Adobe Photoshop (if it is available) and display the graphic for editing. Graphics from PDF documents can also be imported into Adobe PhotoShop or Adobe Illustrator for revision. Microsoft Word and Excel PDF documents created using the Adobe PDF Library can share information with Microsoft’s Word and Excel products. Tables of Contents and other regular tables can be copied from a PDF document and pasted into either of these programs, and the structure of the information will be maintained. Plug-Ins PDF documents created using the Adobe PDF Library will function with PDF plug-ins that comply with PDF standards from 1.2 up to and including the current release level. For example, editing and markup tools which allow users to electronically mark up PDF documents directly using electronic sticky notes (annotations), highlights or underlines can be used. Also, the digital signature tool, Adobe Acrobat SelfSign, can be used on PDF documents to allow users to electronically sign-off on individual files. Digital signatures are also supported within the Adobe PDF Library. Optimizing Performance Through development of the Adobe PDF Library for OS/390 and OS/400, Datalogics has observed some usage lessons which will assist developers with the advanced tasks of creating their applications. Adding Text Runs to Text Objects After realizing that the underlying library was sorting text runs that were within the same text object and on the same baseline, and then calculating the space between them, Datalogics explored the effects of placing each text run into its own text object rather than placing all text runs on a page into one text object. If your application can group all text with the same left edge into the same text object, processing will be more efficient. Currently, different text runs on the same baseline should be placed into different text objects. 4.9 4.10 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Print Issues PostScript output can be generated by emitting print output to a file, such as is demonstrated in the Adobe PDF Library sample application PrintPDF. This is not necessarily the same PostScript output that is generated when emitting a PostScript stream fed directly to an output device such as a printer; that may be (and frequently is) tailored for the specific capabilities of that output device. If for some reason you need to generate (and keep) PostScript output from the Adobe PDF Library, you should understand the different ways that it can be generated, and the slightly different output you may produce as a result. Printing to File vs. Printing to Printer There are several different ways you can generate PostScript output of any given document: • Emitted to a file • Emitted directly to a PostScript output device (e.g. a printer) • Emitted for a printer as above, but captured as a file If you emit PostScript output to a file (the first option above), you will get a fairly generic PostScript representation of your original document, but it is not necessarily in a form that a given PostScript printer can use. In particular, many PostScript printers expect some kind of proprietary Job Control Language at the front of a print job in order to configure the printer before starting, and output emitted to a disk file will not contain that language. When emitting output directly to a given printer (the second option above), the Library application will query the operating system for the printer parameters from the printer's Device Context, and adjust its output to correspond to whatever the printer expects. If you need to generate PostScript output intended to be printed by a specific device at some point in the future, you will need to capture the output for that printer (the third option above), as opposed to simply writing a PostScript file to disk, then trying to send it to a printer later on. PDF Functionality For example, this is the opening of output from the PrintPDF sample application, as issued via the EMIT_TO_FILE setting, writing directly to a disk file (named out.ps in the original sample code): %!PS-Adobe-3.0 %%Title: (untitled) %%Version: 1 3 %%Creator: PDFL 6.0 %%CreationDate: 14:56:56 09/19/05 %%For: (Shane Looker) %%DocumentData: Binary %%LanguageLevel: 2 %%BoundingBox: 0 0 612 792 %%HiResBoundingBox: 0.0 0.0 612.0 792.0 %%Pages: 1 %%DocumentProcessColors: (atend) %%DocumentNeededResources: (atend) %%DocumentSuppliedResources: (atend) %%EndComments %%BeginDefaults %%EndDefaults %%BeginProlog Now compare the results with those created via the EMIT_TO_PRINTER setting and sent to a PostScriptcompatible HP LaserJet 4 Plus, reading down to the same %%BeginProlog comment as before: %-12345X@PJL JOB @PJL SET ECONOMODE=OFF @PJL SET RESOLUTION = 600 @PJL SET RET = MEDIUM @PJL ENTER LANGUAGE = POSTSCRIPT %!PS-Adobe-3.0 %%Title: Test %%Creator: PScript5.dll Version 5.2 %%CreationDate: 9/19/2005 15:10:8 %%For: acg %%BoundingBox: (atend) %%Pages: (atend) %%Orientation: Portrait %%PageOrder: Special %%DocumentNeededResources: (atend) %%DocumentSuppliedResources: (atend) %%DocumentData: Clean7Bit %%TargetDevice: (HP LaserJet 4 Plus) (2013.111)1 %%LanguageLevel: 2 %%EndComments %%BeginDefaults %%PageBoundingBox: 13 13 599 780 %%ViewingOrientation: 1 0 0 1 %%EndDefaults %%BeginProlog The Library interrogated the printer’s Device Context, available from the operating system, and added the job setup information expected by that specific model of printer. The output PostScript stream being 4.11 4.12 A dobe P DF Lib ra r y De v el ope r Ov er v ie w generated here will only be useful for printing to the HP LaserJet 4 Plus, and conversely, that printer will only be expecting a print submission with this coding in its opening lines. If you need to generate PostScript output for this printer now, but send it to the printer later, this is the output stream that you need to capture, as described below. Generating Output for Later Printing Ideally, output being generated now for printing at a later date should be created as a PDF file (ideally with embedded and subset resources), not a PostScript file, but sometimes circumstances dictate that the data must be stored in PostScript form regardless. PostScript output emitted directly to a file (as the Adobe PDF Library sample application PrintPDF demonstrates in its default, as-shipped form) may later be distilled back into PDF, such as via Adobe Distiller or Adobe Normalizer Server, but a stored PostScript file may not necessarily produce successful output if sent to a printer, depending on the printer’s needs and capabilities (e.g. color support, scaling, rotation, etc.). A simple PostScript file document may print successfully; more complicated or elaborate ones may not. In addition, the device may require a special configuration or additional coding that output sent to a disk file would not contain. For example, some brands of PostScript printer require a job prologue to precede the PostScript data. A print stream emitted directly to that printer will contain the necessary prologue; an output emitted to file will not. If PostScript output is to be printed, it should be fed directly to the output device, not emitted to a file in PostScript form for later printing. If a document must be prepared now for printing later, it should be generated and stored in PDF form, not PostScript. In a later step, the Library application can read in the PDF document and print it directly to the device. PDF Functionality Directing Printer-Specific Output to a File As seen in “Printing to File vs. Printing to Printer” on page 4.10 above, simply emitting PostScript output directly to a file will not necessarily produce a printable PostScript document, depending on the requirements of the printer to which you will ultimately send the document. However, in situations where you must generate PostScript print output for later use, the following guidelines should be observed. These (below) refer to details of the Adobe PDF Library sample application PrintPDF; use that for your initial testing: • To write output to a file in the same manner as it would go to a selected printer, you must use the Windows rec.outFileName element within the rec data structure to hold the name of the file to which output will be steered. • The output device normally receiving this print stream should be listed as usual in the devicename element. • EMIT_TO_PRINTER should be defined, not EMIT_TO_FILE. When rec.outFileName is set to NULL, output is sent directly to the selected device, but when it is defined with a non-NULL value, output is sent to the indicated location instead. The output should print successfully if later sent to the device for which it was intended. 4.13 4.14 A dobe P DF Lib ra r y De v el ope r Ov er v ie w 5 Data Conversion This chapter explains the translation process from EBCDIC to ASCII. 5.1 5.2 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Translations Datalogics has made use of the IBM LibASCII library in porting the Adobe PDF Library (v4.0 and above) to OS/390 and OS/400. The driver application may make use of these same functions to handle EBCDIC to ASCII translations. This is not mandatory, however, and any conversion mechanism that works best with the application may be applied instead. Data Conversion (EBCDIC to ASCII) The Adobe PDF Library for OS/390 and OS/400 provides facilities to perform mandatory and optional conversion of EBCDIC to ASCII. Note that the PostScript and PDF files created on an OS/390 and OS/400 system will be in ISO-Latin1. They will not be directly readable or printable in EBCDIC on OS/390 and OS/400 systems. Content Flowing data will appear correctly in PDF output, whether written as EBCDIC or ASCII data, as long as the corresponding glyph mapping accompanies the data. Translation of the flowing data from EBCDIC to ASCII is necessary to facilitate the search feature of PDF, however. If the application data is in EBCDIC format, the application should use the dl_etoa conversion routine (or another of the user’s choice) to convert the data to ASCII before passing it to the Adobe PDF Library. NOTE: dl_etoa assumes that the string to be converted is NULL terminated. dl_etoa Its purpose is to translate an EBCDIC string to an ASCII string. Its return value is char * to the string which has been translated IN PLACE. dl_etoa returns an ASCII string if given a pointer to an EBCDIC string. dl_etoa changes the string IN PLACE, so multiple conversions of the same field should be controlled or avoided. dl_atoe Its purpose is to translate an ASCII string to an EBCDIC string. Its return value is char * to the string which has been translated IN PLACE. dl_atoe returns an EBCDIC string if given a pointer to an ASCII string. dl_atoe changes the string IN PLACE, so multiple conversions of the same field should be controlled or avoided. Data Conversion Metrics Font Metrics are contained in the .PFA files. The Library requires the presence of the font metric information. The .PFA file should be: • in Intel format (FTPed as binary data to the OS/390 and OS/400) • identified as ADOBE.PDFLIB.RESOURCE.T1PFA (for OS/390 systems) or residing in the Integrated File System (for OS/400 systems). The samples delivered by Datalogics on OS/400 assume their location to be /tmp/pdfres. The metric information required by the rendering application to properly display a PDF or PostScript file is part of the font file, making it unnecessary to include this information separately from the embedded font. If the composition application is performing line wrapping, the metrics must be presented to the composition application in some form that the composition application can understand. Literals Literals, which act as markers for the rendering engine, must be converted to ASCII before inclusion in the output file. This is an extension to the Adobe PDF Library which is provided by Datalogics. Where the Adobe PDF Library normally would require a call to ASAtomFromString, EBCDIC data on OS/390 and OS/400 requires that a call to ASAtomFromEBCDICString be used within the application instead. Numeric Values Numeric values must be converted from Big-endian representation on OS/390 and OS/400 to the Littleendian representation expected by the interpreter. This is a built-in feature of the Adobe PDF Library. Beyond the "endian-ness" of the numeric values is the manner in which numeric data is stored by the Adobe PDF Library. Coordinates for text placement and point size values are stored in the upper 16 bits of a 32-bit word. In C, calls to ASInt32ToFixed and ASInt16ToFixed provide the transformation. In Assembler, the application must shift the numeric values left 16 bits, as demonstrated in the HWORLD@A source module. 5.3 5.4 A dobe P DF Lib ra r y De v el ope r Ov er v ie w Index A D Adobe Acrobat 1.2, 1.4, 2.2 Acrobat Distiller 2.2, 4.12 Acrobat SelfSign 4.9 Normalizer Server 2.2, 4.12 Reader 1.2, 1.4, 2.2, 3.10 Adobe PDF Library Applications Programming Interface (API) manual 1.3 Assembler Interface 2.11 Call Structure 2.9 Common Objects 2.10 Composition Application diagram 2.8 Data Flow diagram 2.7 Design 2.7 Document Conventions 1.4 Operation 2.7 OS/390 and OS/400 Enhancements 2.11 Output Generation 2.9 Sample Applications Decrypt 4.6 DLViewer 1.2 MergePDF 4.8 Assembler Interface 2.11 Macros table 2.12 Data Conversions Translations Content 5.2 EBCDIC to ASCII 5.2 Literals 5.3 Metrics 5.3 Numeric Values 5.3 Developer Overview Document Conventions 1.4 How This Book is Organized 1.4 Introduction 1.2 Related Documentation 1.5 What You Should Know 1.3 DLI Sample Applications Annotations 4.5 Encrypt 4.6 DLViewer 1.2 Document Comparison 4.7 Document Content Sharing 4.7 Document Creation 2.6 Document Merging 4.7 Several into one 4.8 Two into one 4.8 Documentation Adobe 3D Annotations Tutorial 1.6 Acrobat and PDF Library API Overview 1.6 Acrobat and PDF Library API Reference 1.6 Adobe PDF Library Overview 1.6 Portable Document Format Reference Manual 1.6 Errata file 1.6 SnippetRunner Cookbook 1.6 Datalogics Adobe PDF Library and DLI Installation Guide 1.5 Adobe PDF Library Developer Overview 1.5 DLI Implementation and Reference Guide 1.5 Java Interface User Guide 1.5 B Backwards Compatibility, Ensuring 2.3 C Comparing Documents 4.7 Compatibility between PDF Documents 4.7 with External Applications 4.8 Adobe Catalog 4.8 Illustrator 4.9 Photoshop 4.9 Microsoft Excel 4.9 Word 4.9 Plug-Ins 4.9 Compatibility with External Applications Microsoft Excel 4.9 Conventions, Document 1.4 F Fonts Multi-master 3.10 M Merging PDF Documents 4.7 I.ii Adobe PDF Library Developer Overview Merging several into one 4.8 Merging two existing documents 4.8 Multi-master fonts 3.10 N Notes All Assembler function and structure names are uppercase 2.12 ASN membership may be required for Adobe website access 1.6 Conversion of PostScript to PDF not supported 2.6 dl_etoa assumes NULL-terminated strings 5.2 Documentation covers both Adobe and Datalogics-supported platforms 1.2 Element coordinates required when adding to a page 3.12 Library cannot be used for PDF display on OS/390 and OS/400 1.2 Library requires x and y coordinates of elements added from driver 3.8 Native Unix compilers available rom Free Software Foundation 2.5 No other Library calls permitted after PDFLTerm 3.3 Output written to file for later printing should be in PDF 4.7 PDF Library Supplement now combined into Acrobat and PDF Library API Reference 1.6 PDF Reference Manual errata file available for download 1.6 PDFLInit must be called before any other Library call 3.3 Popup warning may occur for new PDF in old viewers 2.3 Structure Variations may exist between PDF 1.6 and earlier 1.4 Version designations in documentation 2.4 O Optimizing Performance Adding Text Runs 4.9 OS/390 Access choices 2.4 Examples Assembler 4.2 C Bill 4.2 Hello World 4.2 Sample Code Files on CD table 4.3 OS/400 Access choices 2.4 Output Creation for Later Printing 4.12 PDF 2.6 PostScript 4.10 Output files Methods of generation 2.2 Types 2.2 P PDF File format 1.5 Level Declarations in Output 2.3 Declarations via DLI 2.3 Output overview 2.2 PDF Library Development examples 2.5 Maintenance of 2.2 Port to HP-UX, OS/390 and OS/400 2.4 Processing environments 2.2 Version Control 2.2 PDF Output Bookmarking 4.5 Compatibility between documents 4.7 Compression 4.6 Encryption 4.6 Languages Double Byte 4.5 Single Byte 4.5 Linearization 4.6 Linking (hypertext) 4.5 Optimization 4.6 Password Protection 4.6 PDF and PostScript Generation 4.7 PDWordGetCharPoint 1.7, 1.8 Renamed from PDWordGetNthCharPoint 1.7 Vector calculation from 1.8 PDWordGetNthCharPoint Renamed to PDWordGetCharPoint 1.7 Platform-Specific Concerns OS/400 2.13 Windows 2.13 Portable Document Format Reference Manual 1.5 PostScript Output overview 2.2 Print Issues Directing Printer-Specific Output to a File 4.13 File vs. Printer 4.10 Generating Output for Later Printing 4.12 Overview 4.10 Use of devicename 4.13 Use of rec.outFileName 4.13 Index R readme.txt 1.7 Release Notes 1.7 S subscript Determining placement positions 1.8 Superscript Determining placement positions 1.8 W What’s New in Previous Releases v7.0.1 PDF Functionality 1.9 What’s New in This Release Overview 1.7 v7.0.1 New PDWordGetCharPoint Method 1.7 WordFinder 1.7 I.iii