Preview only show first 10 pages with watermark. For full document please download

Provisioning Data

   EMBED


Share

Transcript

Provisioning Data Information in this document is subject to change without notice. Examples provided are fictitious. No part of this document may be reproduced or transmitted in any form, or by any means, electronic or mechanical, for any purpose, in whole or in part, without the express written permission of Actuate Corporation. © 1995 - 2015 by Actuate Corporation. All rights reserved. Printed in the United States of America. Contains information proprietary to: Actuate Corporation, 951 Mariners Island Boulevard, San Mateo, CA 94404 www.actuate.com The software described in this manual is provided by Actuate Corporation under an Actuate License agreement. The software may be used only in accordance with the terms of the agreement. Actuate software products are protected by U.S. and International patents and patents pending. For a current list of patents, please see http://www.actuate.com/patents. Actuate Corporation trademarks and registered trademarks include: Actuate, ActuateOne, the Actuate logo, Archived Data Analytics, BIRT, BIRT 360, BIRT Analytics, The BIRT Company, BIRT Content Services, BIRT Data Analyzer, BIRT for Statements, BIRT iHub, BIRT Metrics Management, BIRT Performance Analytics, Collaborative Reporting Architecture, e.Analysis, e.Report, e.Reporting, e.Spreadsheet, Encyclopedia, Interactive Viewing, OnPerformance, The people behind BIRT, Performancesoft, Performancesoft Track, Performancesoft Views, Report Encyclopedia, Reportlet, X2BIRT, and XML reports. Actuate products may contain third-party products or technologies. Third-party trademarks or registered trademarks of their respective owners, companies, or organizations include: Mark Adler and Jean-loup Gailly (www.zlib.net): zLib. Adobe Systems Incorporated: Flash Player, Source Sans Pro font. Amazon Web Services, Incorporated: Amazon Web Services SDK. Apache Software Foundation (www.apache.org): Ant, Axis, Axis2, Batik, Batik SVG library, Commons Command Line Interface (CLI), Commons Codec, Commons Lang, Commons Math, Crimson, Derby, Hive driver for Hadoop, Kafka, log4j, Pluto, POI ooxml and ooxml-schema, Portlet, Shindig, Struts, Thrift, Tomcat, Velocity, Xalan, Xerces, Xerces2 Java Parser, Xerces-C++ XML Parser, and XML Beans. Daniel Bruce (www.entypo.com): Entypo Pictogram Suite. Castor (www.castor.org), ExoLab Project (www.exolab.org), and Intalio, Inc. (www.intalio.org): Castor. Alessandro Colantonio: CONCISE Bitmap Library. d3-cloud. Day Management AG: Content Repository for Java. Dygraphs Gallery. Eclipse Foundation, Inc. (www.eclipse.org): Babel, Data Tools Platform (DTP) ODA, Eclipse SDK, Graphics Editor Framework (GEF), Eclipse Modeling Framework (EMF), Jetty, and Eclipse Web Tools Platform (WTP). Bits Per Second, Ltd. and Graphics Server Technologies, L.P.: Graphics Server. Dave Gandy: Font Awesome. Gargoyle Software Inc.: HtmlUnit. GNU Project: GNU Regular Expression. Google Charts. Groovy project (groovy.codehaus.org): Groovy. Guava Libraries: Google Guava. HighSlide: HighCharts. headjs.com: head.js. Hector Project: Cassandra Thrift, Hector. Jason Hsueth and Kenton Varda (code.google.com): Protocole Buffer. H2 Database: H2 database. IDAutomation.com, Inc.: IDAutomation. IDRsolutions Ltd.: JPedal JBIG2. InfoSoft Global (P) Ltd.: FusionCharts, FusionMaps, FusionWidgets, PowerCharts. InfoVis Toolkit. Matt Inger (sourceforge.net): Ant-Contrib. Matt Ingenthron, Eric D. Lambert, and Dustin Sallings (code.google.com): Spymemcached. International Components for Unicode (ICU): ICU library. JCraft, Inc.: JSch. jQuery: jQuery, JQuery Sparklines. Yuri Kanivets (code.google.com): Android Wheel gadget. LEAD Technologies, Inc.: LEADTOOLS. The Legion of the Bouncy Castle: Bouncy Castle Crypto APIs. Bruno Lowagie and Paulo Soares: iText. Membrane SOA Model. MetaStuff: dom4j. Microsoft Corporation (Microsoft Developer Network): CompoundDocument Library. Mozilla: Mozilla XML Parser. MySQL Americas, Inc.: MySQL Connector/J. Netscape Communications Corporation, Inc.: Rhino. NodeJS. nullsoft project: Nullsoft Scriptable Install System. OOPS Consultancy: XMLTask. OpenSSL Project: OpenSSL. Oracle Corporation: Berkeley DB, Java Advanced Imaging, JAXB, Java SE Development Kit (JDK), Jstl, Oracle JDBC driver. PostgreSQL Global Development Group: pgAdmin, PostgreSQL, PostgreSQL JDBC driver. Progress Software Corporation: DataDirect Connect XE for JDBC Salesforce, DataDirect JDBC, DataDirect ODBC. Quality Open Software: Simple Logging Facade for Java (SLF4J), SLF4J API and NOP. Raphael. RequireJS. Rogue Wave Software, Inc.: Rogue Wave Library SourcePro Core, tools.h++. Sencha Inc.: Extjs, Sencha Touch. Shibboleth Consortium: OpenSAML, Shibboleth Identity Provider. Matteo Spinelli: iscroll. StAX Project (stax.codehaus.org): Streaming API for XML (StAX). Sam Stephenson (prototype.conio.net): prototype.js. SWFObject Project (code.google.com): SWFObject. ThimbleWare, Inc.: JMemcached. Twittr: Twitter Bootstrap. VMWare: Hyperic SIGAR. Woodstox Project (woodstox.codehaus.org): Woodstox Fast XML processor (wstx-asl). World Wide Web Consortium (W3C) (MIT, ERCIM, Keio): Flute, JTidy, Simple API for CSS. XFree86 Project, Inc.: (www.xfree86.org): xvfb. ZXing Project (code.google.com): ZXing. All other brand or product names are trademarks or registered trademarks of their respective owners, companies, or organizations. Document No. 141215-2-745310 August 21, 2015 Contents About Provisioning Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Part 1 Accessing data basics Chapter 1 Understanding data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 How a report accesses data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Supported data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Creating reusable data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Creating a connection profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Using a connection profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Setting connection properties when a report runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Setting the folder path for text files at run time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Setting the database user name and password at run time . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Troubleshooting data source problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2 Understanding data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 About data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing and changing output columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding a computed field to a data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verifying the data returned by a data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying the data to retrieve at run time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About the Query Text property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying a value for the Query Text property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combining data from multiple data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combining data from multiple data sources in BIRT Designer Professional . . . . . . . . . . . . . . . Creating a union data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a joined data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joining on more than one key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying a join condition not based on equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Getting information about query execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing where sorts, groups, and filters are executed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the query that is sent to the database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 19 21 22 23 24 24 27 27 31 34 36 38 39 39 Chapter 3 Binding data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 i About binding data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 Understanding column bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 Descriptive names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45 Dynamic updates of calculated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 Creating column bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 Editing and deleting column bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49 Copying data elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50 More about column-binding expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 Part 2 Accessing data in supported data sources Chapter 4 Accessing data in a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Using data object data in a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Connecting to a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Using a data set in a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 Using a data model in a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58 Using a cube in a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Chapter 5 Accessing data in a JDBC database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 About JDBC databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62 Accessing data using a textual query or stored procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62 Creating a JDBC data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62 Managing JDBC drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Adding a JDBC driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64 Deleting a JDBC driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 Restoring a JDBC driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 Using a SQL query to retrieve data from a JDBC data source . . . . . . . . . . . . . . . . . . . . . . . . . .67 Writing a basic SQL query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 Combining data from multiple tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Using a stored procedure to retrieve data from a JDBC data source . . . . . . . . . . . . . . . . . . . . .72 Accessing data using a graphical query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75 Connecting to a database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Specifying the data to retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77 Creating computed columns and complex expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80 Filtering data rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81 Grouping data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Filtering groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 Chapter 6 Accessing data in a NoSQL database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 ii About NoSQL databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Accessing data in Amazon DynamoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Connecting to Amazon DynamoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Specifying the data to retrieve from Amazon DynamoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Filtering data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Accessing data in Amazon Relational Database Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Connecting to Amazon RDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Specifying the data to retrieve from Amazon RDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Accessing data in Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Connecting to a Cassandra scripted data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Specifying the data to retrieve from a Cassandra scripted data source . . . . . . . . . . . . . . . . . 97 Accessing data in Cloudera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Connecting to a Cloudera database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Specifying the data to retrieve from a Cloudera system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Accessing data in Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Connecting to a Hadoop system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Specifying the data to retrieve from a Hadoop system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Accessing data in HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Connecting to an HBase database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Specifying the data to retrieve from an HBase system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Accessing data in MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Connecting to MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113 Specifying the data to retrieve from MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Mapping selected fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116 Writing expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118 Setting runtime connection properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Accessing data in Salesforce.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Connecting to Salesforce.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Specifying the data to retrieve from Salesforce.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 7 Accessing data in files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 About file data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing data in an Excel spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supported data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecting to an Excel workbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying the data to retrieve from an Excel workbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing data in a text file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Text file structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Text file data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a flat file data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying what data to retrieve from a text file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing data in an XML file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 128 128 128 129 130 131 132 133 133 134 136 Creating an XML data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .136 Specifying what data to retrieve from an XML data source . . . . . . . . . . . . . . . . . . . . . . . . . . .137 Accessing data in a BIRT report document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141 Creating a report document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Specifying bookmark names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Specifying element names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145 Connecting to a report document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146 Specifying the data to retrieve from a report document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147 Accessing data in an e.report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148 About ActuateOne for e.Reports Data Connector functionality . . . . . . . . . . . . . . . . . . . . . . .149 Accessing an e.report using Page Level Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Accessing an e.report having multiple sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149 Connecting to an e.report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Specifying the data to retrieve from an e.report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .151 Chapter 8 Accessing data from a web service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Using web services data in a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .156 Connecting to a web service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Specifying what data to retrieve from a web service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158 Creating a web services data source using a custom connection class . . . . . . . . . . . . . . . . . . . .169 Chapter 9 Accessing data in a POJO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Using POJO data in a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Connecting to a POJO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .178 Specifying the data to retrieve from a POJO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180 Part 3 Modeling data Chapter 10 Creating data objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 About data objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .188 Designing data objects for dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Designing data objects for reports created with Report Studio . . . . . . . . . . . . . . . . . . . . . . . .191 Designing data objects for reports created with BIRT Designer Professional . . . . . . . . . . . .191 Building a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Creating new items for a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192 Exporting items to a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Creating a shared dimension for cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195 iv Configuring data set columns for summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Creating hyperlinks to provide drill-down capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Hiding data sets from users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Building a data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Defining joins in a data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Joining data sets automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Removing join loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Disabling join push-down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Creating data set aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Joining data sets on NULL column values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Materializing the data sets in a data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Filtering data in charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Enhancing the usability of a data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Creating a category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Renaming a column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Creating a computed column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 Specifying a drill path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Providing cached data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Publishing a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Enabling incremental updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Replacing the .data file in an iHub volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Replacing the .data file in a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Maintaining a data object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Chapter 11 Providing data security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 About data security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using security IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access control lists and data security rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using data protected by data security rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparing the design and deployment environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparing data security using live and cached data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing and managing data security rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of a data security rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing security IDs on an iHub volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of data model changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data security rule creation and assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding implicit join behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing data security rule assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Editing a data security rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting a data security rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing a data security rule policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Granularity design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 220 220 221 221 222 222 223 224 224 225 225 233 235 236 236 236 237 Implicit join considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .237 Combining rule conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .238 Part 4 Accessing data from BIRT iHub Chapter 12 Connecting to data sources from BIRT iHub . . . . . . . . . . . . . . . . . . . . . 241 About data source connections on BIRT iHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242 Understanding AC_SERVER_HOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242 Understanding acserverconfig.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 About drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 About information object connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Using a connection configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243 Changing a configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244 Specifying the location of the connection configuration file . . . . . . . . . . . . . . . . . . . . . . . . . .244 Defining environment variables for BIRT iHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .246 Connecting BIRT iHub to data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Connecting to a DB2 data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247 Defining DB2 environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .248 Using DB2 libraries on HP-UX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249 Checking a connection to a DB2 instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249 About using XML Extender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249 Connecting to an Informix data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249 Connecting to an Oracle data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250 Defining Oracle environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 About the Oracle listener processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Setting the maximum column length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .251 Connecting to custom data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Installing a custom open data access (ODA) driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252 Using custom Java-based data source connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253 Using custom relational data sources with the Integration service . . . . . . . . . . . . . . . . . .253 Specifying connection types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .254 Using a connection pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 ODBC/JDBC connection pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256 Configuring BIRT JDBC connection pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256 Setting the Encyclopedia engine connection pool reap interval . . . . . . . . . . . . . . . . . . . . . . .258 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 vi Ab ou t Provis ion in g Dat a Provisioning Data provides information about accessing data from JDBC, NoSQL, file-based, and web services data sources for use with BIRT applications. Provisioning Data also describes how to design, secure, and use Actuate data objects and data models. Provisioning Data includes the following chapters: ■ About Provisioning Data. This chapter provides an overview of this guide. ■ Part 1. Accessing data basics. This part explains how to create data sources and data sets and how to bind data. ■ Chapter 1. Understanding data sources. This chapter lists supported data sources and explains how a report accesses data. ■ Chapter 2. Understanding data sets. This chapter explains how to create a data set and how to combine data from multiple data sources. ■ Chapter 3. Binding data. This chapter explains how to create data bindings. A data binding is an expression that specifies what data to display. ■ Part 2. Accessing data in supported data sources. This part explains how to access data in data objects, JDBC databases, NoSQL databases, flat files, XML files, web services, and Excel spreadsheets. ■ Chapter 4. Accessing data in a data object. This chapter explains how to access data provided by a data object. ■ Chapter 5. Accessing data in a JDBC database. This chapter explains how to retrieve data from JDBC databases such as Oracle, DB2, SQL Server, Sybase, and Informix. ■ Chapter 6. Accessing data in a NoSQL database. This chapter explains how to retrieve data from NoSQL databases such as MongoDB, Hadoop, and Amazon Relational Database Service. About Provisioning Data vii viii ■ Chapter 7. Accessing data in files. This chapter explains how to access data in flat files, XML files, Excel spreadsheets, and BIRT report documents. ■ Chapter 8. Accessing data from a web service. This chapter explains how to access data provided by a web service. ■ Chapter 9. Accessing data in a POJO. This chapter explains how to access data provided by a Plain Old Java Object (POJO). ■ Part 3. Modeling data. This part explains how to create data objects, access the data in a data object, and determine the effect of data object changes on other report objects. ■ Chapter 10. Creating data objects. This chapter explains how to create data objects. A data object is a BIRT object that contains all the information necessary to connect to an external data source, retrieve data from that data source, and structure the data in a way that supports business analysis. ■ Chapter 11. Providing data security. This chapter explains how to secure data provided by data objects. Data security rules support restricting access to data rows based on security identifiers assigned to a BIRT iHub user. ■ Chapter 12. Connecting to data sources from BIRT iHub. This chapter explains how to connect to data sources using a BIRT iHub connection configuration file. Provisioning Data Part One 1 Accessing data basics Part 1 ■ Understanding data sources ■ Understanding data sets ■ Binding data Chapter 1 Understanding data sources Chapter 1 This chapter contains the following topics: ■ How a report accesses data ■ Supported data sources ■ Creating reusable data sources ■ Setting connection properties when a report runs ■ Troubleshooting data source problems Chapter 1, Understanding data sources 3 How a report accesses data A report uses the same mechanism to access data from any of the sources listed in Table 1-1. First, you create a data source, which is a BIRT object that contains the information to connect to an underlying data source. Each type of data source requires different connection information. For a JDBC data source, for example, you specify the driver, URL, and user login to connect to a database. An XML data source requires the location of the XML file and, optionally, the location of the XML schema. Next, you create a data set, which is a BIRT object that specifies and returns all the data that is available to a report. For a JDBC data source, for example, you write a SQL query or run a stored procedure to retrieve specific data from a database. For an XML data source, you use XPath expressions to specify the XML elements and attributes from which to retrieve data. Supported data sources BIRT Designer Professional supports all the types of data sources that open source BIRT Report Designer supports, and more. Table 1-1 lists the types of data sources that each product supports. Part 2, “Accessing data in supported data sources,” explains how to access data in supported data sources. Table 1-1 Supported data source types Data source type 4 BIRT Designer Professional BIRT Report Designer Actuate data object ✓ Actuate information object ✓ Actuate JDBC Salesforce.com ✓ ActuateOne for e.Reports ✓ Amazon DynamoDB ✓ Amazon RDS ✓ BIRT report document ✓ Cassandra scripted ✓ Cloudera Hive ✓ Excel ✓ ✓ Flat file ✓ ✓ HBase ✓ Provisioning Data ✓ Table 1-1 Supported data source types BIRT Designer Professional BIRT Report Designer Hive ✓ ✓ JDBC database connection for Query Builder ✓ ✓ JDBC ✓ ✓ MongoDB ✓ ✓ Plain Old Java Object (POJO) ✓ ✓ Scripted ✓ ✓ Static ✓ Web service ✓ ✓ XML document ✓ ✓ Data source type Creating reusable data sources Each BIRT data source you create contains the connections properties required to connect to a database, flat file, XML document, or web service. Often, multiple reports require access to the same source. You might, for example, create a suite of reports that use data from the same corporate sales database. Rather than typing the same connection information repeatedly for each report, you can: ■ Enter the connection information once in a file called a connection profile, then link the connection profile to any BIRT data source that requires that connection information. ■ Create one BIRT data source, then store it in a library that all reports can access. The basic principles behind both techniques are simple: define connection properties once, store the information in a shared location, and reuse the information in as many reports as needed. Another advantage that both techniques offer is the ability to update connection properties in one location and have the changes propagate to all reports. Creating a connection profile You use Data Source Explorer (not to be confused with Data Explorer) to create a connection profile. You can create a connection profile for any data source type, and you can create as many profiles as needed. For example, if your reports frequently access data from a particular database and a particular web service, you can create one connection profile for the database, and one for the web service. Chapter 1, Understanding data sources 5 BIRT saves all the connection profiles in a single file named ServerProfiles.dat. This file is saved in the .metadata folder in your current workspace. The path is: ...\workspace\.metadata\plugins \org.eclipse.datatools.connectivity\ServerProfiles.dat To share your connection profiles with other report developers, place ServerProfiles.dat in a central location. You can rename the file in the event that there are other ServerProfiles.dat files in the central location. How to create a connection profile 1 Choose Window➛Show View➛Other. 2 In Show View, expand Data Management and select Data Source Explorer, then choose OK. Data Source Explorer appears at the bottom of the application window. 3 In Data Source Explorer, expand ODA Data Sources. Data Source Explorer lists the data source types and connection profiles, if any were defined previously. Figure 1-1 shows the list of data source types in Data Source Explorer. Figure 1-1 Data Source Explorer 4 Right-click the data source type for which you want to create a connection profile, then choose New. 5 In New Data Source Profile, type a name and description for the connection profile. Figure 1-2 shows an example of information specified for a JDBC connection profile. Figure 1-2 6 Provisioning Data Name and description specified for a JDBC connection profile Choose Next. 6 In New Data Source Profile, specify the connection values to connect to the data source. Figure 1-3 shows an example of connection properties to connect to a JDBC data source. Figure 1-3 Connection properties for a JDBC data source 7 Choose Test Connection to verify the connection. 8 Choose Finish. The new connection profile appears in Data Source Explorer, as shown in Figure 1-4. It displays the name you specified when you created the connection profile. Figure 1-4 Data Source Explorer displays the new JDBC connection profile Using a connection profile You have the option of creating a new data source using information stored in a connection profile. Using a connection profile saves time and reduces the potential for connection errors because all you need to do is supply the location of the profile. You do not need to know or remember all the required connection information, and, presumably, the connection profile has been tested. To use a connection profile created by another report developer, make sure you get the location of the profile. Chapter 1, Understanding data sources 7 How to create a data source that uses a connection profile 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, select Create from a connection profile in the profile store, as shown in Figure 1-5. Choose Next. Figure 1-5 Creating a data source that uses a connection profile 3 In Connection Profile, in Connection Profile Store, specify the location of the file that contains the connection profiles. You can choose Browse to find and select the file. As described in the previous topic, the default file name is ServerProfiles.dat, and BIRT saves this file, by default, in the .metadata folder in the workspace in use when the connection profile was created. The default path is: ...\workspace\.metadata\.plugins\ org.eclipse.datatools.connectivity\ServerProfiles.dat 4 After you specify the location of the .dat file, Connection Profile displays all the available profiles. Figure 1-6 shows an example of two connection profiles, SystemLogs and AcmeSales. 5 Select the connection profile to use. By default, Use the default data source name is selected. If you selected the SystemLogs profile, the new data source is also named SystemLogs by default. 6 Optionally, specify a different name for the data source. 7 Choose Next. New Data Source Profile displays in read-only text the connection information derived from the connection profile. 8 Choose Finish. The new data source appears under Data Sources in Data Explorer. 8 Provisioning Data Figure 1-6 Connection Profile displays two predefined profiles Setting connection properties when a report runs When you first design and test a report, you create a BIRT data source that uses hard-coded information. For example, the procedure for creating a JDBC data source includes providing a user name and password that the report uses to access a database. When the report is run, the database uses whatever user groups and privileges are assigned to the hard-coded report user. Production reports, however, cannot always use the hard-coded information specified at report design time. A typical example is prompting a report user to provide his credentials when he runs the report. Based on the specified credentials, the database authentication system determines the appropriate data to use to generate the report. To support the setting of connection properties at report run time, the data source editor provides a feature called property binding. As its name suggests, this feature supports binding each connection property to a JavaScript expression that evaluates to a value that the report uses at run time. Because each data source type requires different connection information, the Property Binding page displays a different set of connection properties for each data source type. The Property Binding page is available only when you edit an existing data source. It is not available when you create a new data source. Figure 1-7 shows the Property Binding page for a JDBC data source. Chapter 1, Understanding data sources 9 Figure 1-7 Property Binding page showing JDBC connection properties that can be set at run time Figure 1-8 shows the Property Binding page for a flat file data source. Notice that, for both types of data sources, all the connection properties that you set at design time can also be set dynamically at run time. As the figures also show, if the data source uses a connection profile, you can dynamically assign a connection profile and its location at run time. Figure 1-8 Property Binding page showing flat file connection properties that can be set at run time The following sections show two examples of setting connection properties at run time. Setting the folder path for text files at run time To access a text file, one of the properties that you must specify is the path to the folder that contains the text file. There are cases when the path to the folder can be 10 Provisioning Data determined only at run time. Consider the following scenario: A log file named log.csv is generated daily. Each daily log is stored in an auto-generated folder whose name is the current date. Examples of the full path to these folders are as follows: C:\Logs\2011-07-01 C:\Logs\2011-07-02 C:\Logs\2011-07-03 You design a report that displays data from log.csv. When the report is run, the report uses the log.csv data for the current day. For example, if the report runs on July 1, 2011, the flat file data source uses the log.csv file in the 2011-07-01 folder. If the report runs on July 2, 2011, the data source uses the log.csv file in the 2011-07-02 folder. The following procedure shows how to write a JavaScript expression that returns the full folder path value based on the current date. How to set the folder path for text files at run time This procedure assumes that you have already created a flat file data source. 1 In Data Explorer, right-click the flat file data source, then choose Edit. 2 In Edit Data Source, choose Property Binding. The Property Binding page displays the flat file connection properties that you can set at run time. 3 Choose the expression builder button to the right of Home Folder. 4 In the expression builder, type the following expression: function DF(n) { return (n > 9 ? n : '0' + n); } var d = new Date(); shortDate = (d.getFullYear() + '-' + DF(d.getMonth() + 1) + '-' + DF(d.getDate())); HomeFolder = "C:\Logs\" + shortDate; The JavaScript functions, getFullYear( ), getMonth( ), and getDate( ), get the parts of the full date and time returned by new Date( ). The user-defined DF function formats the month and day parts of the date. The shortDate variable contains the date in 2009-01-01 format. The HomeFolder variable contains the full folder path, which is constructed by concatenating the static path with the shortDate variable. 5 Choose OK to save the expression. Edit Data Source shows the JavaScript expression bound to the Home Folder property, as shown in Figure 1-9. Chapter 1, Understanding data sources 11 Figure 1-9 JavaScript expression bound to the Home Folder property 6 Choose OK to save your changes to the data source. Setting the database user name and password at run time When a report accesses data in a database, a typical action is to prompt the report user to type his user name and password at run time. You accomplish this action by using report parameters. Report parameters provide a mechanism for getting values at run time and passing the values to the report. The following procedure shows how to create report parameters and bind the parameters to the user name and password properties in a JDBC data source. How to enable a user to provide a database user name and password when a report runs This procedure assumes that you have already created a JDBC data source. 1 Create a report parameter to get the user name, using the following steps: 1 In Data Explorer, right-click Report Parameters, then choose New Parameter. 2 In New Parameter, supply the following information: ❏ In Name, type a name for the report parameter. For example: username_param ❏ In Prompt text, specify a word or sentence to prompt the report user to provide a user name value. For example: User name ❏ 12 In Data type, select String. Provisioning Data Figure 1-10 shows the completed report parameter definition. Figure 1-10 Report parameter to get the user name 3 Choose OK. The username_param parameter appears under Report Parameters in Data Explorer. 2 Use the previous steps to create a report parameter to get the password. Use the following values to define the report parameter: ■ In Name, type: password_param ■ In Prompt text, type: Password ■ In Data type, select String. 3 Choose OK. The password_param parameter appears under Report Parameters in Data Explorer. 4 Bind the user name property to the username_param report parameter. 1 In Data Explorer, right-click the JDBC data source, then choose Edit. 2 In Edit Data Source, choose Property Binding. The Property Binding page displays the JDBC connection properties. 3 Choose the expression builder button on the right of User Name. 4 In the expression builder, perform the following tasks: ❏ Under Category, choose Report parameters. All appears under Sub-Category. Chapter 1, Understanding data sources 13 ❏ Choose All. Under Double Click to insert, BIRT Report Designer displays the report parameters that you created. ❏ Double-click the username_param report parameter. The expression params["username_param"].value appears in the text area, as shown in Figure 1-11. Figure 1-11 ❏ Choosing the username_param in the expression builder Choose OK. The expression builder closes. The report parameter expression appears in the User Name field, as shown in Figure 1-12. Figure 1-12 User Name property bound to the username_param report parameter 5 Using the previous steps, bind the password property to the password_param report parameter. 14 Provisioning Data 5 In Edit Data Source, choose OK. 6 Preview the report to confirm that the user is prompted for a user name and password. Figure 1-13 shows Enter Parameters, the dialog that prompts the user to enter a user name and password. Figure 1-13 Enter Parameters prompts the user for a user name and password The values that the user specifies are used to connect to the JDBC database. Troubleshooting data source problems BIRT Report Designer displays information about data source connection problems in several different places. Error reports can appear in the previewer, the Problems view, the error log view, and as pop-up messages. Generally, BIRT Report Designer displays JDBC connection-related problems in pop-up error messages. If the connection information is syntactically correct, but the data source is not available, you see a pop-up message and entries in the error log view. A list of errors appears in the Problems view if a data source is incorrectly defined. You cannot manually delete items from the Problems view. They display until you resolve the problem or delete the object that is creating the problem. If you have problems connecting to a data source from BIRT Report Designer, try connecting using a third-party tool to confirm that the connection string works as expected. This troubleshooting exercise can help you determine whether to focus your troubleshooting on the driver or on the parameters that you have provided. If you make changes to the connection parameters and BIRT Report Designer behaves as though it is still using the original values for the parameters, restart Eclipse using the -clean option. What has happened is that Eclipse is using cached information that contains the previous values. To clear the cache, exit Eclipse and restart using the -clean option. Chapter 1, Understanding data sources 15 16 Provisioning Data Chapter 2 Chapter 2 Understanding data sets This chapter contains the following topics: ■ About data sets ■ Viewing and changing output columns ■ Adding a computed field to a data set ■ Verifying the data returned by a data set ■ Specifying the data to retrieve at run time ■ Combining data from multiple data sources ■ Combining data from multiple data sources in BIRT Designer Professional ■ Getting information about query execution Chapter 2, Understanding data sets 17 About data sets A data set is an object that defines all the data that is available to a report. To create a data set, you must have an existing BIRT data source. As with data sources, BIRT Report Designer provides wizards to create data sets. The first and the only required step in creating a data set is to select the data to retrieve from a data source. After this first step, you can optionally process the raw data as needed for the report. For example, you can change the names of columns, create computed columns, and define filters to provide a subset of the data to the report. You can create as many data sets as are necessary for a report. Typically, you create at least one data set for each data source. For example, if you created a JDBC data source and an XML data source in a report, you would create a JDBC data set and an XML data set. You can also create multiple data sets that use a single data source. If, for example, a report displays sales data from the same database in a chart and in a table, you can create one data set to return data for the chart, and another data set to return data for the table. In this scenario, creating different data sets can improve performance because each data set retrieves only the specific data required by each report element. Alternatively, to use identical data, both items can share the same data set. BIRT Report Designer provides a default name for each data set you create in a report. The names begin with Data Set. If you create two data sets, the default names are Data Set and Data Set1. You should use descriptive names, such as CorporateSalesDB or SupportLogXML, that enable you to identify easily the type of data that the data set provides. For an explanation on how to create data sets for supported data sources, see Part 2, “Accessing data in supported data sources.” Viewing and changing output columns Use the Output Columns page of the data set editor to view the list of columns selected for retrieval, and to specify an alias or display name for each column. BIRT Report Designer uses the display name in Data Explorer and for the column headings in a table. For example, you can give a column named $$CN01 a display name of Customer Name. This display name makes the column easier to identify in Data Explorer and more user-friendly in the column heading of a table. If you are creating a report for multiple locales, you can localize each display name by providing a resource key in the Display Name Key property. A resource key is a text string in an external source that is translated, or localized, into different languages. Specify an alias to use a shorter or more recognizable name when referring to the column in code. For example, give a column named $$CN01 an alias of custName 18 Provisioning Data so that you can write row["custName"] instead of row["$$CN01"]. If you do not specify a display name for the column, BIRT Report Designer displays the alias in Data Explorer and for the column headings in a table. How to view and change output columns 1 Choose Output Columns from the left pane of Edit Data Set. Output Columns displays the names and types of the columns, as shown in Figure 2-1. Figure 2-1 Viewing the output columns for a data set 2 To edit the properties of an output column, select the column name, then choose Edit. Edit Output Column displays the properties you can set. 3 Choose OK to save the edits. Adding a computed field to a data set A data set can contain computed data as well as data that is returned from a data source. Computed data displays the result of an expression, typically involving one or more columns from a data source. For example, if each row that is returned from the data source contains a price and a quantity, you can create a computed field that calculates the total amount paid, using the following expression: row["pricequote"] * row["quantity"] You can also concatenate values from multiple fields, using the + operator, or calculate values using JavaScript functions. The expression builder provides a list of operators and functions that you can use to build expressions. You can also define a computed field in the report layout. Defining computed fields in the data set is, however, the preferred approach. Defining the computed field in the data set separates business logic from the presentation of the data. Defining the computed field in the data set also enables you to verify the results of the calculation in the Preview Results page of Edit Data Set. You can determine whether the expression for the computed field is correct before using the field in the report design. Figure 2-2 shows the Preview Results page including the results of a computed field, Total_cost. Chapter 2, Understanding data sets 19 Figure 2-2 Preview Results including the results of a computed field, Total_cost How to add a computed field to a data set 1 In Edit Data Set, choose Computed Columns. 2 Choose New to create a new computed field. 3 In New Computed Column: 1 In Column Name, type a name for the computed field. 2 In Data Type, select a data type appropriate for the data returned by the computed field. 3 To perform a calculation on all the rows in the data set, in Aggregation, select an aggregate function. If performing an aggregate calculation, you can optionally specify a filter expression in Filter to determine which rows to include in the calculation. 4 In Expression, specify the expression to calculate the desired value. You can either type the expression or use the expression builder to construct the expression. To use the expression builder, complete the following steps: ❏ Choose the expression builder button to open the expression builder. In Category, select Available Data sets, then select your data set. Double-click an item to add it to the text area at the top. Figure 2-3 shows how to create an expression for a computed column. ❏ Choose OK to save the expression. The expression appears in the Expression field in New Computed Column. 5 Choose OK to save the computed field. Computed Columns displays the computed field and the expression you defined. 20 Provisioning Data Figure 2-3 Creating an expression for a computed column 4 Choose Output Columns to see all the columns that are specified in the data set. The computed field appears on this page. 5 Choose Preview Results to confirm that the computed field returns the correct data. 6 Choose OK to save your changes to the data set. Verifying the data returned by a data set After creating a data set, always use Preview Results to verify that the data set returns the expected data. Figure 2-4 shows an example of a result set returned by a JDBC data set. By default, Preview Results shows up to 500 data rows. If you expect the data set to return more than 500 data rows and you want to see all the rows, increase the number of rows that Preview Results displays. Doing so, however, can increase the amount of time it takes to display the results. How to change the number of rows that appear in Preview Results 1 Choose Window➛Preferences. 2 In Preferences, click the plus sign (+) beside Report Design to expand the item. 3 Choose Data Set Editor. 4 In Number of rows to display, type the maximum number of rows to display, then choose OK. Chapter 2, Understanding data sets 21 Figure 2-4 Previewing the results of a data set Specifying the data to retrieve at run time In many reports, the data you select is hard-coded at design time. Some reports, however, require the ability to display a different set of data based on run time criteria, such as user login or the data source that a report user selects. As described in Chapter 1, “Understanding data sources,” the data source editor provides a feature called property binding to support the setting of connection properties at run time. Similarly, the data set editor provides the property binding feature to support the selection of data at run time. Figure 2-5 shows the Property Binding page for a JDBC data set. The Query Text property is where you specify an expression that determines at run time what data to select. The Query Text property is also available to other types of data sets. Figure 2-5 22 Provisioning Data Property Binding page for a JDBC data set About the Query Text property When you select data, BIRT stores the query in a property named queryText. Choose the XML Source tab on the report editor to see the XML source that BIRT generates when you create a report. Figure 2-6 shows the part of the XML source that defines the value of the queryText property. In the example shown, the report uses data from the sample database, ClassicModels. The queryText property contains the SQL SELECT statement that specifies the data to retrieve. Figure 2-6 Report’s XML source showing the queryText property for a JDBC data set Figure 2-7 shows an example of a report’s XML source, where the report uses data from a text file named ProductionData.csv. Although the flat file data set does not use a SQL query to select data, the queryText property contains a statement similar to a SQL query. Figure 2-7 Report’s XML source showing the queryText property for a flat file data set Figure 2-8 shows the queryText property for a report that uses a web service data set. This property contains the SOAP request template. Figure 2-8 Report’s XML source showing the queryText property for a web service data set Chapter 2, Understanding data sets 23 Specifying a value for the Query Text property The Query Text property on the Property Binding page of the data set editor is the same as the queryText property in the report source file. The value you specify on the Property Binding page updates the queryText property in the source file, and must use the same format you see in the report source file. In addition, you must enclose the value you type in double quotation marks (" "). The following example shows the correct syntax for specifying a Query Text value for a JDBC data set: "select CUSTOMERNAME, CUSTOMERNUMBER from CUSTOMERS where COUNTRY = 'Australia'" The following example shows the correct syntax for specifying a Query Text value for a flat file data set: "select Date, Open, High from StockHistory.csv" The previous examples showed the syntax for a Query Text value, but both examples specified static data. Typically, when you specify a value for the Query Text property, you use variables or JavaScript expressions that evaluate to a specific value at run time. The following example shows how to select a text file at run time. The JavaScript expression params["pTextFileName"] refers to a report parameter whose value evaluates to a file name specified at run time. "select PRODUCTNAME, QUANTITYINSTOCK, MSRP from " + params["pTextFileName"] The following example shows how to select from a database the customer rows for a particular country. The country value is specified by the user at run time through the report parameter p_Country. "select CUSTOMERNAME, CUSTOMERNUMBER, COUNTRY from CUSTOMERS where COUNTRY = " + "'" + params["p_Country"] + "'" Combining data from multiple data sources The capability to join data sets is a useful and easy way to combine data from two data sources. For example, you can combine data from two XML files, or combine data from a text file with data from a database table. Before joining data sets, you must create the individual data sets. For example, to combine data from an XML file with data from a text file, you must first create the XML data set and the text file data set. Joining data sets is similar to joining tables in a database, but with the limitation that you can join only two data sets. In a database, you can join more than two tables. 24 Provisioning Data Joining two data sets creates a BIRT object called a joint data set. Just as you can with a regular data set, you can add computed columns and filters to a joint data set, and preview the results it returns. Once you understand the concepts of joining data sets, you can be creative about combining data from more than two sources, assuming that the data from the various sources relate in some way. Although each joint data set can join only two data sets, you can use a joint data set as one or both of those data sets. For example, you can create joint data set A and joint data set B, then join both of them. Doing so, in effect, combines data from four data sets. Figure 2-9 illustrates this concept. Data Set 1 accesses XML source A Data Set 2 accesses XML source B Data Set 3 accesses a text file Data Set 4 accesses a database table CustID CustomerName OrderID CustID OrderID Status OrderID Item 01 Mark Smith 110 02 110 Open 110 Printer 02 Maria Hernandez 115 03 115 Shipped 115 Laptop 03 Soo-Kim Young 120 01 120 Open 120 Scanner 04 Patrick Mason Inner Join on CustID Inner Join on OrderID Joint Data Set 1 Joint Data Set 2 CustomerName OrderID OrderID Status Item Mark Smith 120 110 Open Printer Maria Hernandez 110 115 Shipped Laptop Soo-Kim Young 115 120 Open Scanner Inner Join on OrderID Joint Data Set 3 Figure 2-9 CustomerName OrderID Status Item Mark Smith 120 Open Scanner Maria Hernandez 110 Open Printer Soo-Kim Young 115 Shipped Laptop Combining data from four data sets As Figure 2-9 also shows, each data set can return data from different types of sources. You could also use joint data sets to join tables in a single database. For performance reasons, however, this usage is not recommended. Where possible, you should always join multiple tables through the SQL SELECT statement. You should create joint data sets only to: ■ Combine data from disparate data sources. ■ Combine data from non-relational data sources, such as XML or text files. Chapter 2, Understanding data sets 25 The types of joins used to join data sets yield the same results as the similarly named database joins. Like the database joins, you must specify a column on which to join the two data sets. The following list describes each join type: ■ Inner join returns rows from both data sets where the key values match. ■ Left outer join returns all rows from the first data set, even if there are no matches in the second data set. ■ Right outer join returns all rows from the second data set, even if there are no matches in the first data set. ■ Full outer join returns all rows from both data sets, even if there are no matches in either data set. How to join data sets 1 In Data Explorer, right-click Data Sets, and choose New Joint Data Set. 2 Select the first data set for the joint data set from the drop-down list at the left of New Joint Data Set. The columns of the first data set appear in the panel below your selection. 3 Select the second data set for the joint data set from the drop-down list at the right of New Joint Data Set. The columns of the second data set appear. 4 Select the columns to join. Select one column from the first data set, and one column from the second data set. Typically, you select the columns that are common to both data sets. BIRT Report Designer does not prevent you from selecting two unrelated columns. Doing so, however, typically does not provide the correct results. Figure 2-10 shows an example of a joint data set definition. Figure 2-10 26 Provisioning Data Joint data set definition 5 Select a join type, then choose Finish. 6 In Edit Data Set, choose Preview Results to see the rows returned by the joint data set. Combining data from multiple data sources in BIRT Designer Professional Sometimes, the data that a report requires originates in several data sources. For example, an application system generates monthly transaction data and the data for each month is saved in a separate CSV file, or a system saves data in different formats, such as XML and CSV. In BIRT Report Designer, you can combine data from multiple data sources by creating a joint data set. BIRT Designer Professional provides two additional options for combining data from multiple sources. You can create a union data set or a joined data set. The option you choose depends on the data structures and the results you want. Both options entail combining data sets, so before creating a union data set or a joined data set, you first create the individual data sets. For example, to combine data from an XML file with data from a CSV file, you must first create the XML data set and the flat file data set. Creating a union data set A union data set combines the results returned by two or more data sets. Creating a union data set is similar to using a SQL UNION ALL statement, which combines the result sets of multiple SELECT statements into a single result set. Create a union data set to consolidate data from multiple sources that have similar data structures. For example, a company uses separate database tables to store contact information about employees and contractors. The structures of the tables are similar. Both contain Name and Phone fields. Suppose you want to create a master contact list for all employees and contractors. The solution is to create one data set to retrieve employee data, a second data set to retrieve contractor data, and a union data set to combine data from the previous data sets. Figure 2-11 illustrates the data sets that return employee and contractor data. Employees data set Contractors data set Name Phone E-mail Mark Smith 650-343-2232 [email protected] Name Phone Sarah Brown 650-545-3645 Patrick Mason 650-343-1234 [email protected] Sean Calahan 415-242-8254 Soo-Kim Yoon 650-343-5678 [email protected] Paula Mitchell 650-662-9735 Maria Gomez Michael Lim 408-234-2645 Figure 2-11 650-343-9876 [email protected] Data sets with common fields for employee and contractor data Chapter 2, Understanding data sets 27 When creating a union data set, you select the fields to include. Figure 2-12 shows a union data set that includes all the fields from both the Employees and Contractors data sets. The Name field contains all employee and contractor names. The Phone field also contains all employee and contractor phone numbers. The E-mail field exists only in the Employees data set, so only employee data rows have e-mail data. Union data set Name Phone E-mail Mark Smith 650-343-2232 [email protected] Patrick Mason 650-343-1234 [email protected] Soo-Kim Yoon 650-343-5678 [email protected] Maria Gomez 650-343-9876 [email protected] Sarah Brown 650-545-3645 Sean Calahan 415-242-8254 Paula Mitchell 650-662-9735 Michael Lim Figure 2-12 408-234-2645 Union data set of all data from Employees and Contractors data sets Figure 2-13 shows a union data set that includes only the common fields, Name and Phone, from the Employees and Contractors data sets. Union data set Name Phone Mark Smith 650-343-2232 Patrick Mason 650-343-1234 Soo-Kim Yoon 650-343-5678 Maria Gomez 650-343-9876 Sarah Brown 650-545-3645 Sean Calahan 415-242-8254 Paula Mitchell 650-662-9735 Michael Lim Figure 2-13 408-234-2645 Union data set of common fields in Employees and Contractors data sets In the previous example, the two data sets used to create a union data set contained common fields with the same names. This condition is required for consolidating data into a single field. However, data sources often use different field names. Suppose the Name field in the Employees and Contractors data sets is EmployeeName and ContractorName, respectively. To create a union data set that consolidates employee and contractor names in a single field, rename the field names in the individual data sets to use the same name. When creating the 28 Provisioning Data Employees data set, in Output Columns, use the Alias property to give the EmployeeName field another name. Figure 2-14 shows the EmployeeName field with the alias, Name. Figure 2-14 Alias specified for the EmployeeName field Similarly, when creating the Contractors data set, edit the ContractorName field to use the same alias. How to create a union data set This procedure assumes that you have created the data sets to include in the union data set. 1 In Data Explorer, right-click Data Sets, and choose Union Data Set. 2 In New Data Set, in Data Set Name, optionally type a name for the union data set. 3 Choose New. 4 In New Union Element, in Select Data Set, select the first data set that contains the data to include in the union data set. The fields in the selected data set appear, as shown in Figure 2-15. Figure 2-15 Fields in a data set selected for a union data set Chapter 2, Understanding data sets 29 5 Select the fields to include in the union data set, then choose OK. 6 Repeat steps 3 to 5 to add the next data set to the union data set. Figure 2-16 shows a union data set named Master Customer List that consists of fields from two data sets, PlatinumCustomers and GoldCustomers. Figure 2-16 Definition of a union data set that combines two data sets Choose Finish. Edit Data Set displays the selected fields, and provides options for editing the data set. 7 Choose Preview Results. Figure 2-17 shows the rows returned by the Master Customer List union data set. Figure 2-17 30 Provisioning Data Data rows returned by the union data set Creating a joined data set A joined data set combines the results of two or more data sets that are related through a common key. Creating a joined data set is similar to joining tables in a database using a SQL JOIN clause. Use a joined data set to combine data from different data sources in which a relationship exists, as shown in the following example. Figure 2-18 illustrates data sets that return data about customers and orders. The data sets are related through the CustomerID field. You can retrieve order information for each customer by joining the data sets. Customers data set Orders data set CustomerName CustomerID OrderID Amount CustomerID Sarah Brown 1001 110 1500.55 1003 Sean Calahan 1002 115 12520.00 1001 Paula Mitchell 1003 120 8450.50 1004 Michael Lim 1004 125 7550.00 1002 Figure 2-18 Data sets with a common field returning customer and order data Figure 2-19 shows the results of joining the customers and orders data sets on the CustomerID key, and displaying only the CustomerName and Amount fields in the joined data set. CustomerName Amount Sarah Brown 12520.00 Sean Calahan 7550.00 Paula Mitchell 1500.55 Michael Lim 8450.50 Figure 2-19 Data rows returned when the customers and orders data sets are joined BIRT Designer Professional supports the functionality of joined data sets available in the open source version, and provides the following additional features: ■ The capability to join more than two data sets ■ The capability to join on more than one key ■ Support for additional join operators: <>, <, >, <=, >= ■ Support for the side-by-side join Unlike the other types of supported joins (inner, left outer, right outer, and full outer), the side-by-side join links data sets without requiring a key. The resulting joined data set displays the selected fields side by side. Figure 2-20 shows two data sets that do not share a common field. The first data set returns customer data, and the second data set returns order data. Chapter 2, Understanding data sets 31 Customers data set Orders data set CustomerName CustomerID OrderID Amount Sarah Brown 1001 110 1500.55 Sean Calahan 1002 115 12520.00 Paula Mitchell 1003 120 8450.50 Michael Lim 1004 125 7550.00 Figure 2-20 Data sets without a common field Figure 2-21 shows the results of joining the customers and orders data sets using the side-by-side join. When using this type of join, do not misinterpret the results. As Figure 2-21 shows, the data from the two data sets appear side by side, implying that each customer has a relationship with an order when, in fact, no such relationship exists. CustomerName CustomerID OrderID Amount Sarah Brown 1001 110 1500.55 Sean Calahan 1002 115 12520.00 Paula Mitchell 1003 120 8450.50 Michael Lim 1004 125 7550.00 Figure 2-21 Results of a side-by-side join How to create a joined data set This procedure assumes that you have created the data sets to include in the joined data set. 1 In Data Explorer, right-click Data Sets, and choose Join Data Set. 2 In New Data Set, in Data Set Name, optionally type a name for the joined data set. 3 Specify the data sets to use in the joined data set. From Available data sets, drag each data set to the editing area. Figure 2-22 shows three data sets in the editing area. 4 Specify the fields from each data set to include in the joined data set. Perform the following tasks for each data set: 1 Select a data set by clicking anywhere in the image of the data set. Do not click a field name. 2 Choose Output Columns. 3 In Edit Data Set Properties, under Select output columns, select the desired data set fields, then choose OK. 32 Provisioning Data Figure 2-22 Three data sets selected for a joined data set 5 Specify the conditions for joining the data sets. Perform the following tasks for each pair of data sets. In the example shown in Figure 2-22, specify a condition for joining the first and second data sets, and a condition for joining the second and third data sets. 1 Select the arrow between two data sets. 2 Choose Conditions. 3 In Define join type and join conditions, specify the following information: 1 In Join Type, select the type of join to use. 2 If you select a join type other than Side-By-Side, define a join condition. ❏ Choose New. ❏ Select the fields on which to join, and select an operator that specifies how to compare the values in the fields being joined. Figure 2-23 shows a join condition that combines data when the CUSTOMERNUMBER value in the Customers data set is equal to the CUSTOMERNUMBER value in the Orders data set. ❏ Choose OK. Chapter 2, Understanding data sets 33 Figure 2-23 Joining data sets on a common field Define join type and join conditions displays the specified condition, as shown in Figure 2-24. Figure 2-24 Definition of an inner join 4 Choose OK. 6 Choose Finish to save the joined data set. Joining on more than one key You can specify more than one join condition when joining two data sets. For example, you can join a customers data set with a sales offices data set, shown in Figure 2-25, to find the names of customers and sales managers that are located in the same city and state. 34 Provisioning Data Customers data set Sales Offices data set City State SalesMgr San Francisco California Los Angeles California Robert Diaz Los Angeles California New York New York Monica Blair Paula West New York New York San Francisco California Susan Kline Joanne Kim San Diego California Name City Mark Smith Patrick Mason Figure 2-25 State Data sets with two common fields, City and State You would create the following join conditions: ■ The first condition, shown in Figure 2-26, compares the State values in the Customers and SalesOffices data sets and looks for a match. Figure 2-26 ■ Joining on a State field The second condition, shown in Figure 2-27, compares the City values in both data sets and looks for a match. Figure 2-27 Joining on a City field The joined data set returns the results shown in Figure 2-28, if the join type is fullOuter. Chapter 2, Understanding data sets 35 Figure 2-28 Data rows returned by the joined data set Specifying a join condition not based on equality The condition for joining values in two fields is usually based on equality (=), as shown in all the examples so far. Less common are join conditions that use any of the other comparison operators: not equal (<>), greater than (>), less than (<), greater than or equal to (>=), and less than or equal to (<=). The following example shows the use of joins that are not based on equality. In the example, a Sales data set is joined with a Commissions data set. The joined data set uses a >= join and a < join to look up the commissions to pay to sales managers, based on their sales totals and management levels. Figure 2-29 shows the Sales and Commissions data sets. In the Commissions data set, each level has four commission rates. For level 1, a commission rate of 25% is paid if a sales total is between 75000 and 100000, 20% is paid if a sales total is between 50000 and 75000, and so on. Sales data set Commissions data set SalesMgr Level TotalSales Level LowRange HighRange Susan Kline 1 55000 1 75000 100000 25 Robert Diaz 2 45000 1 50000 75000 20 Monica Blair 2 28000 1 25000 50000 15 Sean Calahan 1 23000 1 15000 25000 10 2 70000 100000 25 2 45000 70000 20 2 20000 45000 15 2 10000 20000 10 Figure 2-29 Commission Data sets returning sales and commission rates data The following join conditions specify the fields on which to join and how to compare the values in the fields being joined: 36 Provisioning Data ■ The first condition, shown in Figure 2-30, compares the Level values in the Sales and Commissions data sets and looks for a match. Figure 2-30 ■ The second condition, shown in Figure 2-31, uses the >= operator to compare the TotalSales values in the Sales data set with the LowRange values in the Commissions data set. Figure 2-31 ■ Joining on the Level field and looking for a match Joining on TotalSales and LowRange fields using the >= operator The third condition, shown in Figure 2-32, uses the < operator to compare the TotalSales values in the Sales data set with the HighRange values in the Commissions data set. Figure 2-32 Joining on TotalSales and HighRange fields using the < operator Chapter 2, Understanding data sets 37 The second and third join conditions check if a sales total is greater than or equal to LowRange and less than HighRange. The joined data set returns the results shown in Figure 2-33. Figure 2-33 Data rows returned by the joined data set Getting information about query execution When a report accesses data from a database, it is useful to understand what queries the report sends to the database, and how charts and tables get their data. For example, if you create a dynamic filter on a table to display sales data for certain products only, does BIRT send a query to retrieve sales data for all products then filter at the table level to display data for specific products, or does BIRT send a query that retrieves only data for specific products? Answers to questions such as this can help you optimize the performance of a report. To get information about the queries that are executed, right-click a report element, such as a table or a chart, then choose Show Query Execution Profile. Figure 2-34 shows an example of a query that is executed for a table. In this example, Query Execution Profile shows the following information: 38 ■ The data set (Products Data Set) that is bound to the table, the original query specified, and the query modified by BIRT and sent to the database ■ A sort definition that sorts data rows by product name in ascending order ■ A filter condition (row["QUANTITYINSTOCK"] > 5000 ■ A group definition that groups data by vendor ■ Data bindings associated with the table Provisioning Data Figure 2-34 Query execution profile for a table Reviewing where sorts, groups, and filters are executed Sorting, grouping, and filtering data are resource-intensive activities. The performance of a report improves when data is processed by the database rather than by BIRT. To see where data sorting, grouping, and filtering are processed, select each of these items in the query execution profile. For example, click a specific filter, as shown in Figure 2-35, to see whether the filter is executed in BIRT or at the database level. In the filter information, “Push Down: applied” means that the filter is pushed down to, or executed by, the database. Similarly, select the sort and group definitions to see where these tasks are executed. Reviewing the query that is sent to the database Another piece of useful information that the query execution profile provides is whether, and how, BIRT modifies a query when you sort, group, or filter data using the graphical tools. BIRT can modify a query to perform these tasks at the database level if the report accesses the database through an information object or a JDBC connection for query builder data source. Chapter 2, Understanding data sets 39 Figure 2-35 Filter information displayed in the query execution profile Select the Original: SELECT statement to see the query specified originally. Select the Effective: SELECT statement to see the query modified by BIRT. Figure 2-36 shows an example of the SELECT statement in the original query. Figure 2-37 shows an example of the SELECT statement in the modified query. Figure 2-36 40 Original query displayed in the query execution profile Provisioning Data As Figure 2-37 shows, BIRT changes the original query to add a filter condition (WHERE clause) and a sort condition (ORDER BY clause). Figure 2-37 Modified (effective) query displayed in the query execution profile Where data filtering occurs can affect performance significantly because filtering can mean the difference between retrieving hundreds or millions of rows of data. When you create filters using the graphical filter tool, BIRT pushes a filter to the database if the filter condition can be mapped to a SQL expression (if using the JDBC connection for query builder data source) or an Actuate SQL expression (if using an information object data source). Using that criterion, the following are examples of when BIRT pushes a filter to the database: ■ The filter uses an operator supported by the database, for example, <, >, =. BIRT-specific operators, such as Match, Top Percent, and Bottom Percent, do not have SQL equivalents, so a filter that uses any of these operators is not pushed to the database. ■ The filter uses an expression that refers to a field in a database table. For example, the following filter condition is pushed to the database if SalesTotal is a column in the database table: row["SalesTotal"] Greater than 5000000 On the other hand, the following filter condition is not pushed to the database if Profit is a computed column derived from other columns, for example, row["Sales"] - row["Cost"]: row["Profit"] Greater than 2000000 Chapter 2, Understanding data sets 41 42 Provisioning Data Chapter 3 Binding data Chapter 3 This chapter contains the following topics: ■ About binding data ■ Understanding column bindings ■ Creating column bindings ■ Editing and deleting column bindings ■ Copying data elements ■ More about column-binding expressions Chapter 3, Binding data 43 About binding data The data set or data sets that you create provide the data you want to use in a report. Before you can use or display this data in a report, you must first create the necessary data bindings. To display the data in a report, you simply drag data set fields from Data Explorer to a table in the layout editor. Each time you insert a data set field, BIRT creates a data binding. This data binding, called a column binding, defines an expression that specifies what data to display. The column binding also defines a name that report elements use to access data. To view the column bindings that BIRT creates for each data set field that you place in a table, select the table, then, in Property Editor, choose the Binding tab. Figure 3-1 shows an example of column bindings created for each data set field in a table. Figure 3-1 Table’s binding page showing four column bindings Understanding column bindings For each piece of data to display or use in a report, there must be a column binding. For this discussion, note that data refers to dynamic data, and not the static text that you type for a label. Dynamic data is data from a data set, or data that is calculated from a function or a formula. The data is dynamic because the values are not fixed at design time. 44 Provisioning Data The default column binding, which BIRT Designer creates for a data set field, uses the data set field name as the name of the column binding. In Figure 3-1, the expression defined for the first column binding is dataSetRow["CITY"]. This expression indicates that the column binding accesses data from the data set field, CITY. In the layout editor, the column-binding name appears within square brackets ([]), as shown in Figure 3-1. Column bindings form an intermediate layer between data set data and report elements—such as chart, data, dynamic text, and image elements—that display data. Figure 3-2 illustrates this concept. Report elements can access data only through column bindings. Report elements Column bindings Data element Product_Name Product_Name Data element Product_Code Product_Code Image element Product_Image Product_Image Figure 3-2 Data set fields Report elements access data set data through column bindings The preceding examples show column bindings that access data in a data set. Column bindings can also access data derived from functions or user-defined formulas. For example, you can use a data element to display the current date derived from the JavaScript Date object. You would create a column binding that uses the following expression: new Date() Figure 3-3 shows this column binding definition. Figure 3-3 User-defined column binding Descriptive names One of the benefits of using column bindings is that you control the names used in the report. Instead of displaying data set field names, which are often not descriptive enough, or formulas, which can be long, you can specify short Chapter 3, Binding data 45 and descriptive names. If you share report designs with other report developers, descriptive names make that design much easier to understand. Modifying and maintaining a report design that has user-friendly names is easier. Dynamic updates of calculated data Another advantage of column bindings becomes apparent when working with calculated data. When a report needs to display a series of related calculated data, column bindings enable you to create and update calculations easily. For example, assume a report contains the following four data elements: ■ The first data element uses column binding, Order_Total, which uses the SUM function, and the following expression to calculate the sum of all order line items: dataSetRow["pricequote"] * dataSetRow["quantity"] ■ The second data element uses column binding, Sales_Tax, which refers to the previous column binding, Order_Total, to calculate the sales tax. The expression defined for the Sales_Tax column binding is: row["Order_Total"] * 0.08 Without using column bindings, the second data element must use the SUM function and the following longer expression to calculate sales tax: (dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.08 ■ The third data element uses column binding, Shipping_Charge, which also refers to the first column binding, Order_Total, to calculate the shipping charge. The expression defined for the Shipping_Charge column binding is: row["Order_Total"] * 0.02 Again, without using column bindings, the third data element must use the SUM function, and the following lengthier expression to calculate the shipping charge: (dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.02 ■ The fourth data element uses column binding, Invoice_Total, which refers to all the previous column bindings to calculate the grand total. The expression defined for the Invoice_Total column binding is: row["Order_Total"] + row["Sales_Tax"] + row["Shipping_Charge"] Without column bindings, the expression would be more complicated: (dataSetRow["pricequote"] * dataSetRow["quantity"]) + ((dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.08) + ((dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.02) You have already seen how column bindings make expressions shorter and more readable. Now, consider the case where you need to update one calculation that is 46 Provisioning Data used by other calculations. Suppose you need to change how Order_Total is calculated, from: dataSetRow["pricequote"] * dataSetRow["quantity"] to: (dataSetRow["pricequote"] * dataSetRow["quantity"]) dataSetRow["discount"] Because the second, third, and fourth data elements use Order_Total in their calculations, without using column bindings, you must manually edit those calculations as well. For example, without using column bindings, you would have to revise the expression for the fourth element as follows: ((dataSetRow["pricequote"] * dataSetRow["quantity"]) dataSetRow["discount"]) + ((dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.08) + ((dataSetRow["pricequote"] * dataSetRow["quantity"]) * 0.02) By using column bindings, any change to the first calculation automatically applies to the second, third, and fourth calculations. By modifying only one expression instead of three, your work is faster and less error-prone. Creating column bindings When you drag a data set field from Data Explorer to a table in the layout editor, BIRT creates the column binding. When you bind a table to a data set, BIRT also creates a column binding for each field in the data set. For other cases, when inserting a dynamic text, text, image, or data element from the palette, you manually create the column binding if you want the element to display dynamic data. If the information to display is static—for example, a specific image stored in a file system, or literal text—then column binding is not needed. How to create a column binding This procedure shows an example of creating a column binding for a data element. 1 Drag a data element from the palette and drop it in the report. 2 In New Data Binding, create a new column binding: 1 In Column Binding Name, specify a unique name for the column binding. 2 In Display Name, optionally specify a different name to display in the report design. If you leave this property blank, the report design displays the Column Binding Name value. Chapter 3, Binding data 47 3 In Data Type, select a data type appropriate for the data returned by the expression you specify next. 4 In Expression, specify the expression that indicates the data to return, using one of the following methods: ❏ Type the expression directly in the Expression field. ❏ If you need help constructing the expression, choose the expression builder button to launch the expression builder. Figure 3-4 shows an expression in the expression builder that combines the values of two data set fields selected from the Customers data set. Figure 3-4 The expression builder showing a column-binding expression Choose OK when you finish constructing the expression. Figure 3-5 shows an example of a column-binding definition. Figure 3-5 New column-binding definition 5 Choose OK to save the column binding. 48 Provisioning Data In the layout editor, the report design displays [Contact_Name] in the location where you inserted the data element, indicating that the data element uses the column binding you defined. 3 Preview the report. The data element displays the data defined in the column binding expression. Editing and deleting column bindings Be careful when editing or deleting column bindings. More than one element can use a column binding, and a column binding can refer to other column bindings. Earlier in this chapter, you saw examples of how a change to a calculated-data expression cascaded to other expressions. The ease with which you can dynamically update formulas that refer to column bindings also requires that you be aware of those dependencies. To minimize errors, BIRT allows you to edit the data type, display name, and the expression, but not the name of the column binding. Figure 3-6 shows Edit Data Binding, which opens when you double-click a data element in the report. The value in Column Binding Name is read-only. Figure 3-6 Edit Data Binding showing the definition of a column binding If you could edit the name of the column binding, expressions in other column bindings that refer to that column binding would no longer be valid, unless you also update all expressions to refer to the renamed column binding. For example, as Figure 3-6 shows, the expression in Sales_Tax refers to a column binding named Order_Total. If you could rename Order_Total, the expression in Sales_Tax, row["Order_Total"] * 0.08, would refer to a column binding that no longer exists. Deleting a column binding that is used by multiple elements results in errors in the report design. Consider the following scenario: You drop a data set field, COUNTRY, in a table. BIRT Designer creates a column binding named COUNTRY that refers to the data set field. You create a sort condition to display rows alphabetically by country names. The sort expression uses the COUNTRY column binding. Later, you decide not to Chapter 3, Binding data 49 display the COUNTRY values in the report. You delete the [COUNTRY] data element from the table. To maintain an accurate list of column bindings used in the table, you also delete the COUNTRY column binding from the table’s Binding page. When you run the report, BIRT displays an error message because the sort expression still refers to the COUNTRY column binding, which no longer exists. Before you edit or delete a column binding, consider these guidelines: ■ A change to a column binding’s expression applies to other column bindings that refer to that column binding. ■ A column binding can be used in a variety of expressions, not just by data elements that display field values in the report. For example, expressions used to sort, group, filter, or highlight data also refer to column bindings. If you delete a column binding, preview the report immediately to make sure you have not introduced any errors. If the report no longer generates, use the Undo functionality to restore the report to its previous state. Copying data elements Another action to be careful about is copying and pasting data elements. In a report that displays the same or similar data in multiple places, the natural inclination is to copy the data element and paste it elsewhere in the report. For example, you want to display the order ID in two places: the detail row and header row of a table. You already inserted the order ID data element in the detail row, so you copy the data element and paste it in the header row. When you run the report, the order ID appears in both places. Later, you decide to add static text, Order Number:, to the order ID value that appears in the header row. You double-click the data element in the header row, and change the expression in the Edit Data Binding dialog from: dataSetRow["ORDERNUMBER"] to: "Order Number: " + dataSetRow["ORDERNUMBER"] When you choose OK to save the change, BIRT prompts you to choose one of the following options: ■ Create a new column binding for the selected data element. This option enables you to make changes only to the selected data element. ■ Update both data elements to use the new expression. Figure 3-7 shows the message that BIRT displays when you edit an expression for a column binding that is used by multiple data elements. 50 Provisioning Data Figure 3-7 Options for editing a column-binding expression When you copy and paste a data element, all the copies refer to a single instance of the column binding. Because this behavior is not apparent, BIRT asks how you want a change in a column binding’s expression or data type to be processed. Otherwise, you might assume that selecting a data element and editing its column binding information affects only the selected data element. More about column-binding expressions When writing an expression for a column binding, the expression can refer to data set fields, other column bindings, functions, and operators. The expression builder simplifies writing an expression by displaying the available items—data set fields, column bindings, functions, and so on. The items available in the expression builder change depending on where you define the column binding. For example, if you insert a data element in a table that contains other column bindings, the data element can access those column bindings. If you insert a data element directly on the report page, the data element cannot access column bindings defined for the table or any other report element. When you select an item in the expression builder, the expression builder adds the item to the expression with the proper syntax. When a column-binding expression refers to a data set field, the syntax is: dataSetRow["datasetField"] When a column-binding expression refers to another column binding, the syntax is: row["columnBinding"] If you use the expression builder to construct column-binding expressions, you do not need to remember what syntax to use. You will find it helpful, though, to understand what each syntax means because the expression examples that appear throughout the book use both syntaxes. Chapter 3, Binding data 51 52 Provisioning Data Part Two 2 Accessing data in supported data sources Part 2 ■ Accessing data in a data object ■ Accessing data in a JDBC database ■ Accessing data in a NoSQL database ■ Accessing data in files ■ Accessing data from a web service ■ Accessing data in a POJO Chapter 4 Accessing data in a data object Chapter 4 This chapter contains the following topics: ■ Using data object data in a report ■ Connecting to a data object ■ Using a data set in a data object ■ Using a data model in a data object ■ Using a cube in a data object Chapter 4, Accessing data in a data object 55 Using data object data in a report A data object provides a report access to predesigned data sources, data sets, data models, and cubes. Report developers create data objects to streamline the report creation process. Data objects provide the following benefits: ■ Simplified data access and retrieval. The predesigned data sources, data sets, data models, and cubes in a data object enable report developers to select the data to use in a report without knowledge of the underlying data source, how to connect to it, and how to extract data from it. ■ Reusability across multiple reports. If a suite of reports require the same data, designing the data sources, data sets, data models, and cubes once in a shared data object eliminates the need to design the same elements repeatedly for each report. ■ Dynamic updates to data items. Changes to data items in a data object propagate to reports that use the data object, ensuring that reports have the latest updates, such as connection properties. A report accesses data from a data object through either a data object design (.datadesign) file or a data object store (.data). The data design file retrieves data, on demand, each time the report is run. A data object store contains cached, or materialized, data, and provides much more efficient access to data. If getting real-time data is more important than report generation speed, use the data object design file. If data in the underlying data source does not change constantly, or if a data object store is generated regularly, use the data object store. As with other types of data sources, for a report to use data from a data object, you must create the following BIRT objects: ■ A data source that contains the information to connect to a data object ■ A data set that specifies the data to use from a data set, data model, or cube in the data object Connecting to a data object When creating a data source to connect to a data object, the only information required is the name of the data object. How to connect to a data object 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 56 Provisioning Data 1 Select Actuate Data Object Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. Choose Next. 3 In New Actuate Data Object Data Source, next to Data Object, choose Browse. Select Data Object File displays all the data objects (.datadesign and .data files) in the resource folder. Select the data object to use in the report, then choose OK. Figure 4-1 shows an example of a data object data source that connects to a data object named Revenue.data. Figure 4-1 Data object selected 4 Choose Next. This button is enabled if the data object contains report parameters. Provide the parameter values. 5 Choose Finish. The data source appears under Data Sources in Data Explorer. The report now has access to all the data sets, data models, and cubes defined in the data object. Using a data set in a data object Because a data object contains predesigned data sets, all you do is select a data set from the data object and the columns from the selected data set. How to specify what data to retrieve from a data object 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the data object data source to use. Data Set Type displays Actuate Data Object Data Set. 2 In Data Set Name, type a name for the data set. Chapter 4, Accessing data in a data object 57 3 Choose Next. 3 In New Actuate Data Object Data Set, in Available Data Sets, select one of the data sets defined in the data object. 4 Select the columns to retrieve, and move them to the right pane. 5 Choose Finish to save the data set. Edit Data Set displays the columns in the data set, as shown in Figure 4-2. Figure 4-2 Columns in a data set 6 Choose Preview Results to view the data rows that the data set returns. 7 Choose OK to close the data set editor. Using a data model in a data object You can use a data model to provide data for a report. You can create a data model, using data from a data set, or you can use a predesigned data model in the data object. The option you choose depends on how familiar you are with creating data models, whether you want modifications to the original data model to propagate to the report, or whether you need control over the data model data. You cannot edit a data model from a data object. How to use a data model in a data object 1 Create a report. 2 Create a data source using the data object that contains the data model. 3 In Data Explorer, right-click Data Models and Data Cubes and choose Use Data Object Data Model. 58 Provisioning Data 4 In Use Data Model, in Available Data Models, select the appropriate data model. Choose OK. 5 Expand Data Models and Data Cubes, as shown in Figure 4-3. Figure 4-3 Data Explorer showing a data model 6 Drag the appropriate report elements from the Palette to the layout pane. For example, to create a cross tab report, drag a Cross Tab element into the layout pane. 7 Drag the appropriate fields from the data model in Data Explorer to the element in the layout pane. Using a cube in a data object To add a cross tab to a report, you can use a cube to provide data for the cross tab. You can create a cube, using data from a data set, or you can use a predesigned cube in the data object. The option you choose depends on how familiar you are with creating cubes, whether you want modifications to the original cube to propagate to the cross tab, or whether you need control over the cube data. You cannot edit a cube from a data object. How to use a cube in a data object 1 In Data Explorer, right-click Data Models and Data Cubes, then choose Use Data Object Cube. 2 In Use Data Object Cube, specify the following information. Then choose OK. 1 In Name, type a name for the cube. 2 In Select Data Source, select the data source that connects to the data object that contains the cube. 3 In Available Data Cubes, select a cube. Figure 4-4 shows the selection of a cube named Revenue from a data object data source named Revenue Expenses Data Object. Chapter 4, Accessing data in a data object 59 Figure 4-4 60 Provisioning Data Cube selected from a data object Chapter 5 Accessing data in a JDBC database Chapter 5 This chapter contains the following topics: ■ About JDBC databases ■ Accessing data using a textual query or stored procedure ■ Accessing data using a graphical query Chapter 5, Accessing data in a JDBC database 61 About JDBC databases Most relational databases, such as Oracle, SQL Server, or MySQL, use JDBC drivers. A report can access data from any database or other data source that uses a JDBC driver. If you are using BIRT Designer, make sure you have the appropriate JDBC driver for the database that you want to access. BIRT supports JDBC 3.0 drivers. Download the necessary drivers from a data source vendor or third-party web site. If you are using BIRT Designer Professional, JDBC drivers for commonly used databases are already installed. To retrieve data from a JDBC database, you can write a SQL query textually or construct it graphically, or you can use a stored procedure. To write a SQL query textually or use a stored procedure, create a JDBC data source. To construct a SQL query graphically, create a JDBC Database Connection for Query Builder data source. Accessing data using a textual query or stored procedure If you want to retrieve data from a database using a textual SQL query or a stored procedure, create a JDBC data source. Creating a JDBC data source When creating a JDBC data source in BIRT Designer, you select the driver class and provide a URL to connect to the database. Typically, the URL uses either a server name and port number or a database server instance name to connect to a server. Additional parameters or property-value pairs define the database to which to connect. For example, Microsoft SQL Server prior to the 2014 release accepts the following syntax that specifies either a port number or a named server instance: jdbc:actuate:sqlserver://SERVER[:PORT][\INSTANCE] [;databasename=??][;PROP=VALUE]* SQL Server 2014 and later requires a named server instance, shown in the following syntax: jdbc:actuate:sqlserver://SERVER\INSTANCE [;databasename=??][;PROP=VALUE]* If the database requires user authentication, provide a user name and a password. If the database uses connection pooling, specify the path to the service. Many databases use a connection pool—a cache of database connection objects—to minimize the overhead of establishing a connection to the database. A connection pool optimizes database performance through the reuse of connections, which 62 Provisioning Data reduces the need to repeatedly open and close connections each time a report is run. How to specify the connection information for a database or other JDBC data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select JDBC Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. The name must be unique in the current report. Choose Next. New JDBC Data Source Profile appears, as shown in Figure 5-1. Figure 5-1 Defining JDBC connection information 3 Specify the connection information for the JDBC data source. 1 In Driver Class, choose a driver class from the drop-down list. If you do not see the driver class that you need, add the driver as described in “Adding a JDBC driver,” later in this chapter. 2 In Database URL, type the URL for the database, using the format that the driver requires. For a MySQL database, for example, the format is: jdbc:mysql://:/ 3 In User Name, type a valid user name to connect to the JDBC data source. This field can be left blank if the data source does not require a user name. 4 In Password, type a valid password to connect to the JDBC data source. This field can be left blank if the data source does not require a password. 5 In JNDI URL, type the full path to the connection pooling service, if applicable. The following path is an example: java:comp/env/jdbc/MyDataSource In the example, MyDataSource is the name of the JNDI database service. Chapter 5, Accessing data in a JDBC database 63 4 To ensure that the connection information is correct, choose Test Connection. If Test Connection returns an error, repeat the preceding steps to correct the error. Then, test the connection again. 5 Choose Finish. The new JDBC data source appears under Data Sources in Data Explorer. Managing JDBC drivers Use the JDBC driver manager to install and manage drivers. You access the JDBC driver manager, shown in Figure 5-2, from the data source editor. To indicate which drivers are available for use, the JDBC driver manager displays symbols next to the file names in the JAR files list. ■ A symbol x indicates that a file, previously available to BIRT, is no longer in the JDBC drivers directory. Reports that use this driver cannot access the underlying data source. To solve this problem, you must restore the driver in the JDBC drivers directory. ■ An asterisk (*) indicates that a file does not exist in the original location from which you installed the driver. Reports that use this driver can still access the underlying data source as long as the driver is in the JDBC drivers directory. ■ A plus sign (+) indicates that a file has been restored to the JDBC drivers directory. In Figure 5-2, the JDBC driver manager indicates that the JAR file is no longer in the JDBC drivers directory. Figure 5-2 Managing JDBC drivers Adding a JDBC driver To install a JDBC driver, the JDBC driver manager prompts you to select the JAR file that contains the JDBC driver. The JAR file you select is copied to the following directory: eclipse\plugins \org.eclipse.birt.report.data.oda.jdbc_\drivers 64 Provisioning Data The driver or drivers contained in the JAR file located in this directory are available to all your report designs. How to add a JDBC driver This procedure assumes you are creating a new JDBC data source, and you need to install a new driver because the driver that the database requires is not available in the list of drivers. 1 In New JDBC Data Source Profile, shown in Figure 5-1, choose Manage Drivers. 2 In Manage JDBC Drivers, choose Add to install the JAR file that contains the driver. 3 Navigate to the directory that contains the JAR file. Select the JAR file and choose Open. Manage JDBC Drivers shows the new JAR file. 4 Choose Drivers to see the list of installed drivers, as shown in Figure 5-3. Figure 5-3 Viewing JDBC driver classes 5 Optionally, set the properties for a driver, using the following steps: 1 Select the new driver, then choose Edit. 2 In Edit JDBC Driver, specify the following information: ❏ In Driver Display Name, type a user-friendly name that describes the driver. This name appears in parenthesis next to the driver class name in Driver Class on New JDBC Data Source Profile. ❏ In URL Template, type the URL format that the driver requires. This URL format appears in Driver URL on New JDBC Data Source Profile. Figure 5-4 shows an example of properties specified for a MySQL JDBC driver. Chapter 5, Accessing data in a JDBC database 65 Figure 5-4 Properties specified for a JDBC driver 3 Choose OK. Manage JDBC Drivers displays the new display name and URL template syntax suggestion. 6 In New JDBC Data Source Profile, specify the connection properties to connect to the JDBC data source. The Driver Class list displays the installed driver. Deleting a JDBC driver If you are upgrading a driver, delete the JAR file that contains the current driver before adding a new file. If the JAR file contains more than one driver, the driver manager deletes all drivers in the JAR file. How to delete JDBC drivers in a JAR file 1 Right-click an existing JDBC data source, then choose Edit from the context menu. Edit Data Source appears. 2 Choose Manage Drivers to open the JDBC driver manager. 3 Select the JAR file that contains the driver, then choose Delete. BIRT Designer removes the JAR file and any drivers that it contains from the JDBC drivers directory. Restoring a JDBC driver When you install a new version of BIRT Designer, the process replaces all the files in the eclipse directory and all the drivers installed in the JDBC drivers directory are lost. Reports that use a JDBC driver to access a data source display an error when you run them. Figure 5-5 shows an example of such an error. To solve this problem, use the JDBC driver manager to restore the drivers. 66 Provisioning Data Figure 5-5 Report displaying a Cannot load JDBC Driver class error How to restore a JDBC driver 1 In Data Explorer, right-click the JDBC data source, then choose Edit from the context menu. Edit Data Source appears. 2 In Edit Data Source, choose Manage Drivers to open the JDBC driver manager. The JDBC driver manager indicates that the driver used by the report no longer exists in the JDBC driver directory. 3 Select the JAR file to restore, then choose Restore. BIRT Designer restores the driver from the original location to the JDBC driver directory and replaces the x next to the file name with a plus sign (+), as shown in Figure 5-6. Figure 5-6 JDBC driver restored Using a SQL query to retrieve data from a JDBC data source Typically, a JDBC data set retrieves data using a SQL query. SQL is a standard query language for requesting data from a database. This section discusses how to write a basic SQL query and how to combine data from multiple tables. In many cases, a basic knowledge of SQL is sufficient to retrieve the data a report requires. Chapter 5, Accessing data in a JDBC database 67 Writing a basic SQL query A SQL query consists of one or more statements. The first statement of a SQL query is the SELECT statement that specifies which columns to retrieve from the database. The SELECT statement contains two required clauses: SELECT and FROM. The SELECT clause lists the columns to retrieve. The FROM clause specifies the table from which to retrieve the selected columns of data. The following is an example of a SQL statement that selects the firstname and lastname columns from a table called customers: SELECT customers.firstname, customers.lastname FROM customers A SQL SELECT query can also include other clauses that limit what data a query returns. Use the WHERE clause to specify criteria that results must meet and use ORDER BY to sort results. The following is an example of the same SQL statement, with the addition of the WHERE and ORDER BY clauses: SELECT customers.firstname, customers.lastname FROM customers WHERE customers.country = 'Japan' ORDER BY customers.lastname, customers.firstname Combining data from multiple tables Typically, you have to select data from two or more tables to retrieve complete data for your report. This operation is called a join. You join tables in a database through a common column called a key. For example, suppose you want to retrieve the orders for every customer. The database, however, stores customer information in a Customers table, and order information in an Orders table, as shown in Figure 5-7. Both tables contain a column called CustomerID. You can join the customers and the orders table using the CustomerID column. Customers CustomerID CustomerName OrderID Amount CustomerID 01 Mark Smith 110 251.49 02 02 Maria Hernandez 115 145.75 03 03 Soo-Kim Young 120 176.55 01 04 Patrick Mason Figure 5-7 68 Orders Provisioning Data Database stores customer and order information in two tables To retrieve order information for every customer, use the following SELECT statement: SELECT Customers.CustomerName, Orders.Amount FROM Customers, Orders WHERE Customers.CustomerID = Orders.CustomerID The WHERE clause in this example specifies that the query returns rows where the CustomerID in both tables match. Figure 5-8 shows the results that the SELECT statement returns. Alternatively, use the JOIN keyword to select data from the two tables. The rest of this section describes the different types of joins you can use, and the results that each join returns. The following SELECT statement uses INNER JOIN and returns the same results shown in Figure 5-8: SELECT Customers.CustomerName, Orders.Amount FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID CustomerName Amount Mark Smith 176.55 Maria Hernandez 251.49 Soo-Kim Young 145.75 Figure 5-8 Results returned by SELECT statement The INNER JOIN clause returns all rows from both tables where the two CustomerID fields match. If there are rows in the Customers table that do not match rows in the Orders table, those rows are not listed. In the example, Patrick Mason is not listed in the result set because this customer does not have a matching order. To obtain all the customer names, whether or not a customer has an order, use the LEFT JOIN clause, as shown in the following example: SELECT Customers.CustomerName, Orders.Amount FROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID LEFT JOIN returns all rows from the first (left) table, even if there are no matches in the second (right) table. Figure 5-9 shows the results of the SELECT statement that uses the LEFT JOIN clause. Here, Patrick Mason is listed in the result set even though he does not have an order, because the record is in the first table. Chapter 5, Accessing data in a JDBC database 69 CustomerName Amount Mark Smith 176.55 Maria Hernandez 251.49 Soo-Kim Young 145.75 Patrick Mason Figure 5-9 Results of a left join Conversely, to retrieve all rows from the second table (the Orders table in our example), even if there are no matches in the first table (the Customers table), use the RIGHT JOIN clause, as shown in the following example: SELECT Customers.CustomerName, Orders.Amount FROM Customers RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID In our example, all the rows in the second table match rows in the first table, so the result is the same as in Figure 5-8. If, however, the Orders table had contained rows that did not have matches in the Customers table, those rows would also have been returned. To retrieve all customer names and orders from both tables, even if there are no matching values, you can use the FULL OUTER JOIN clause, as shown in the following example: SELECT Customers.CustomerName, Orders.Amount FROM Customers FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID In our example, the result is the same as in Figure 5-9. All the customer names and all the order amounts are returned. Some databases do not support FULL OUTER JOIN. In most cases, you can get the same results using the UNION operator. Note that in all the examples, the SELECT statements specify the columns being joined: Customers.CustomerID and Orders.CustomerID. You must specify the columns to join. If you do not, the result is what is commonly referred to as a Cartesian join. In a Cartesian join, all rows in the first table are joined with all rows in the second table. If the first table has 1000 rows and the second table has 10,000 rows, the Cartesian join returns 10,000,000 rows, a result you rarely want. The inner, left, and right joins are the most common types of joins. For more information about these joins and others that your database supports, see the database manufacturer’s documentation. 70 Provisioning Data How to create a SQL query to retrieve data from a JDBC data source This procedure assumes you have already created the JDBC data source that this data set uses. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, under JDBC Data Source, select the data source to use. 2 In Data Set Type, select SQL Select Query. 3 In Data Set Name, type a name for the data set. 4 Choose Next. Query displays information to help you create a SQL query. Available Items lists the items in the data source. 3 To see the tables in a database, expand the database, as shown in Figure 5-10. Figure 5-10 Viewing a schema Chapter 5, Accessing data in a JDBC database 71 You can use the following filter options to display specific schemas or objects. Choose Apply Filter after specifying the filters. ■ In Schema, select an item to display only objects from that schema. ■ In Type, select the type of objects—tables, views, or stored procedures—to display. ■ In Filter, type one or more letters to display only the objects that have names beginning with that letter or letters. You can also use SQL filter characters for the database that you are using. For example, on some databases, an underscore (_) matches any single character, and the percent sign (%) matches any sequence of characters. 4 To display the columns in a table or view, click the plus sign (+) next to a table or view name. 5 In the text area, type a SQL statement that indicates what data to retrieve from the JDBC data source. Alternatively, drag tables, views, and columns from Available Items to the text area to insert their names in the SQL statement at the insertion point, as shown in Figure 5-11. For some databases, if a table or column name contains spaces or SQL reserved words, you must enclose the name in quotation marks (" "). If you drag and drop tables and columns, and those items need to be enclosed in double quotation marks, select the Quote all identifiers option. When this option is selected, the data set editor inserts quotation marks around a table or column name when you drop it in the text area. 6 Choose Finish. Edit Data Set displays the columns specified in the query, and provides options for editing the data set. Using a stored procedure to retrieve data from a JDBC data source BIRT Designer also supports using a stored procedure to retrieve database data. As its name suggests, a stored procedure is a procedure that is stored in a database. A stored procedure consists of SQL statements used to execute operations or queries on a database. A stored procedure can: 72 ■ Return a result set, which is a set of rows. ■ Accept input parameters, which are parameters used to pass data to the stored procedure. For example, a stored procedure runs a query that returns the customer name and credit limit for a specified customer ID. In this case, the stored procedure defines an input parameter to get the customer ID. ■ Use output parameters to return values. The stored procedure described in the previous point uses two output parameters to return the name and credit limit for a specified customer ID. Provisioning Data Figure 5-11 Adding a table to a SQL query To run a stored procedure, use the call statement. The following statement is an example of running a stored procedure named getEmployeeData that contains no parameters. This type of stored procedure typically returns a result set. {call getEmployeeData()} The following is an example of running a stored procedure named getClientData that contains three parameters. The first and second parameters are output parameters, and the third is an input parameter. {call getClientData(?, ?, 103)} Each ? character is a placeholder for the output parameter value that the stored procedure returns. BIRT relies on the capabilities of the underlying JDBC driver in its support for stored procedures. For more robust support, use a JDBC driver that fully implements the JDBC interfaces that are related to stored procedures, including those that provide its metadata. The jTDS project on SourceForge.net, for Chapter 5, Accessing data in a JDBC database 73 example, provides a pure Java (type 4) JDBC 3.0 driver for Microsoft SQL Server, which supports stored procedures. How to use a stored procedure to retrieve data from a JDBC data source This procedure assumes you have already created the JDBC data source that this data set uses. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the JDBC data source to use. 2 In Data Set Type, select SQL Stored Procedure Query. 3 In Data Set Name, type a name for the data set. Choose Next. Query displays a template for executing a stored procedure. 3 In Available Items, navigate to the stored procedure. Select the stored procedure, and drag it to the text area. The stored procedure name appears at the insertion point. Type the arguments if the stored procedure uses parameters. Figure 5-12 shows a stored procedure selected in Available Items. The stored procedure has three parameters. The text area displays the call statement to run the stored procedure. 4 Choose Finish to save the data set. 5 Verify the results returned by the stored procedure, using one of the following methods: ■ If the stored procedure returns a result set, choose Preview Results to see the data rows. If a stored procedure returns multiple result sets, select the result set you want by specifying the result set’s name or number. To do so, choose Settings and, in Result Set Selection, specify the result set. Figure 5-13 shows an example of selecting the second result set. ■ If the stored procedure returns output parameter values, choose Preview Output Parameters. Figure 5-14 shows the values returned by the stored procedure shown in Figure 5-12. Accessing data using a graphical query In addition to providing a query editor for typing the SQL statement to specify the data to retrieve, BIRT Designer provides a SQL query builder that you can use to create SQL statements graphically. The query builder provides several advantages over the query editor: 74 Provisioning Data ■ Figure 5-12 Displaying stored procedures in a database Figure 5-13 Selecting a result set The ability to create and edit SQL statements quickly without typing SQL code. This feature is particularly useful when creating multiple joins, using SQL functions, or writing complex expressions. Chapter 5, Accessing data in a JDBC database 75 Figure 5-14 Previewing output parameter values ■ The ability for BIRT to modify a SQL query when you sort, group, or filter data using the graphical tools. BIRT maps these user-interface actions to the equivalent SQL expressions, and because the database processes the data, the report’s performance improves. ■ Join pushdown. In many cases, joins created in the query builder are processed by the database, or “pushed down.” The database can process joins far more efficiently than BIRT can. This topic describes how to access database data using the query builder. For information about using the query editor, see “Accessing data using a textual query or stored procedure,” earlier in this chapter. As with other types of data sources, for a report to retrieve data from a database using the query builder, you must create the following BIRT objects: ■ A data source that contains the information to connect to the database ■ A data set that specifies the data to retrieve from the database Connecting to a database The connection properties vary depending on the specific database, but the following information is typically required: ■ The database type ■ The JDBC driver ■ The database name and URL ■ Login credentials If you get a java.sql.SQLInvalidAuthorizationSpecException error message when you attempt to connect to the database, edit the JDBC driver and connection details. Uncheck Use integrated authentication. Then set the following properties in Properties—Optional: 76 Provisioning Data ■ AuthenticationMethod= ■ domain= where is the authentication method, for example ntlmjava, and is the domain. How to connect to a database 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select JDBC Database Connection for Query Builder from the list of data source types. 2 In Data Source Name, type a name for the data source. Choose Next. 3 In New JDBC Database Connection for Query Builder, select the database to which to connect. To connect to the Classic Models sample database, select Derby. Choose Next. 4 In New Connection Profile, select a driver, and provide the information to connect to the database. Figure 5-15 shows an example of the properties to connect to the sample database. These properties are supplied by default when you select Derby as the connection profile type. To connect to a different Derby database, select a different driver and specify the appropriate connection information. Figure 5-15 Connection properties for the sample database Choose Test Connection to check the connection to the database. Choose Finish. Chapter 5, Accessing data in a JDBC database 77 Specifying the data to retrieve When you create a JDBC Database Connection for Query Builder data source, as described in the previous topic, you have access to the SQL query builder. This graphical tool provides access to your database schema and objects, and wizards that help you select and join tables and sort, group, and filter data. How to create a query using the query builder 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the JDBC Database Connection for Query Builder data source to use. Data Set Type displays SQL Select Query [Query Builder]. 2 In Data Set Name, type a name for the data set. 3 Choose Next. The query builder appears, as shown in Figure 5-16. The top pane displays a SELECT statement. You can type a query here, or use the tools in the middle and bottom panes to create the query. The rest of this procedure describes the steps for creating a query graphically. Figure 5-16 Query Builder 3 Select the tables and columns that contain the data to use in the report. 1 Right-click in the middle pane, and choose Add Table. 78 Provisioning Data 2 Expand a database schema, and select the desired table. The table and its columns appear in the middle pane. 3 Select the desired columns. 4 Repeat the previous steps to select columns in other tables. 4 Join the tables. 1 Right-click in a table, then choose Create Join. 2 In Create Join, specify the tables and columns to join, and the join type. Figure 5-17 shows an inner join on the CUSTOMERNUMBER columns in the CUSTOMERS and ORDERS tables. Figure 5-17 An inner join between two tables 3 Repeat the previous steps to join all the tables. Figure 5-18 shows three tables that are joined. The SELECT statement in the top pane is updated to reflect the selected columns and table joins. The bottom pane lists the selected columns and their properties. 5 If necessary, use the tabs in the bottom pane to do the following: ■ Choose Columns to edit column properties and create computed columns or complex expressions. ■ Choose Conditions to filter data rows. ■ Choose Groups to group aggregate data. ■ Choose Group Conditions to filter groups. The following sections provide more information about each task. Chapter 5, Accessing data in a JDBC database 79 Figure 5-18 Query Builder showing three joined tables Creating computed columns and complex expressions Using SQL, you can manipulate data to return it in the format that you require. You can create computed columns that return values derived from multiple fields, for example: QuantityOrdered * PriceEach ContactFirstName || ' ' || ContactLastName You can aggregate data using SQL functions, for example: SUM(OrderAmount) AVG(OrderTotal) You can create statements that provide if-then-else logic, for example: CASE WHEN QuantityInStock > 0 THEN 'In Stock' ELSE 'Out of Stock' END How to create a computed column 1 Choose Columns. 2 In Column, click in an empty cell. Click the arrow button, scroll down the list of available columns, and choose Build Expression. 80 Provisioning Data 3 Click outside the cell to open the expression builder. 4 In Expression Builder, select the type of expression to build and choose Next. Expression Builder displays different properties depending on the expression type. Figure 5-19 shows an example of a function expression. The expression uses an aggregate function, SUM, to calculate order totals. The expression, SUM(QUANTITYORDERED * PRICEEACH), is created by selecting the SUM function, and the required columns and operator to use in the calculation. Figure 5-19 Example of a function expression Choose Finish. The computed column appears under Columns. 5 In Alias, type an alias for the computed column so that it is easily identified. Filtering data rows Databases typically contain vast amounts of data. Reports, however, typically use a small subset of the data, so SQL queries often contain filter conditions to limit the rows returned. How to filter data rows 1 Choose Conditions. 2 Create a filter condition. Chapter 5, Accessing data in a JDBC database 81 1 In Column, select a column or Build Expression to create an expression. 2 In Operator, select an operator. 3 In Value, select a column or type a value. 4 In AND/OR, optionally select AND or OR to specify another filter condition. Figure 5-20 shows an example of a filter condition. Figure 5-20 Filter condition on data rows The following WHERE clause is added to the SELECT statement in the top pane: WHERE CLASSICMODELS.ORDERS.ORDERDATE > '2004-06-30' Grouping data If you use aggregate functions, such SUM or AVG, you typically have to group the results by one or more columns. For example, an Orders table contains order records and some customers have multiple orders. To get the order totals for each customer, you would use the GROUP BY clause to group the customers, and the SUM function to aggregate the order totals by customer. How to group data 1 Choose Groups. 2 Under Column, click in an empty row. Click the arrow button, and select the column whose aggregate data to group. Figure 5-21 shows an example in which results are grouped on the CUSTOMERNAME column. Figure 5-21 Grouping on the CUSTOMERNAME column The following GROUP BY clause is added to the SELECT statement: GROUP BY CLASSICMODELS.CUSTOMERS.CUSTOMERNAME 82 Provisioning Data Filtering groups You can specify filter conditions for data that is grouped. For example, if order records are grouped by customer, you can select only customers whose order totals exceed $150,000. The filter condition for a group is specified using the HAVING clause. This clause behaves like the WHERE clause, but is applicable to groups. The WHERE clause, on the other hand, applies to individual rows. A SELECT statement can contain both WHERE and HAVING clauses. For example, you can select customers whose order totals exceed $150,000, factoring only orders placed after 06/30/2004. The SELECT statement would look like the following: SELECT CustomerName, SUM(OrderAmount) FROM Orders WHERE OrderDate > '2004-06-30' GROUP BY CustomerName HAVING SUM(OrderAmount) > 150000 How to filter groups 1 Choose Group Conditions. 2 Create a filter condition. 1 In Column, select a column or Build Expression to create an expression. 2 In Operator, select an operator. 3 In Value, select a column or type a constant value. 4 In AND/OR, optionally select AND or OR to specify another filter condition. Figure 5-22 shows an example of a filter condition specified for groups. Figure 5-22 A filter condition specified for groups The following HAVING clause is added to the SELECT statement: HAVING SUM(CLASSICMODELS.ORDERDETAILS.QUANTITYORDERED * CLASSICMODELS.ORDERDETAILS.PRICEEACH) >= 150000 Chapter 5, Accessing data in a JDBC database 83 84 Provisioning Data Chapter 6 Accessing data in a NoSQL database Chapter 6 This chapter contains the following topics: ■ About NoSQL databases ■ Accessing data in Amazon DynamoDB ■ Accessing data in Amazon Relational Database Service ■ Accessing data in Cassandra ■ Accessing data in Cloudera ■ Accessing data in Hadoop ■ Accessing data in HBase ■ Accessing data in MongoDB ■ Accessing data in Salesforce.com Chapter 6, Accessing data in a NoSQL database 85 About NoSQL databases A NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended for simple retrieval and appending operations, with the goal being significant performance benefits in terms of latency and throughput. NoSQL databases are finding significant and growing industry use in big data and real-time web applications. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow SQL-like query languages to be used. Reports designed using BIRT Report Designer can access data from the following NoSQL databases: ■ Cassandra scripted data source ■ Hadoop (accessed using a Hive driver) ■ MongoDB In addition, reports designed using BIRT Designer Professional can access data from the following NoSQL databases: ■ Amazon DynamoDB ■ Amazon Relational Database Service ■ Cloudera ■ HBase ■ Salesforce.com Accessing data in Amazon DynamoDB Amazon DynamoDB is a fully-managed database service that supports the operations of a non-relational, or NoSQL, database in the cloud. Organizations use Amazon DynamoDB to manage large volumes of data that do not require the features of a relational database, and to offload the administrative costs and complexities of operating a database. Amazon DynamoDB is a schema-less database, and does not support relational queries, such as joins, or complex transactions. It is particularly useful for storing data, such as product catalogs, logs, or forum postings, where the data structure does not require a relational model. 86 Provisioning Data An Amazon DynamoDB database organizes data into tables. A table is a collection of items and each item is a collection of attributes. For example, a products table can contain items, such as books, DVDs, and CDs. A book item can have ID, title, author, publisher, and ISBN attributes. A CD item can have ID, title, and artiste attributes. Unlike a database with a schema, items in an Amazon DynamoDB table are not required to have the same number or types of attributes. Each attribute is a name-value pair. The value can be a single value or a multi-value set, as shown in the following examples: Author = "J.K. Rowling" Author = "Susan Smith", "John Smith" For each item, one attribute—typically an ID—must be a primary key that identifies the item in a table. The primary key is the only part of the table that is indexed, and it is also used to hash partition data across multiple servers. BIRT Designer Professional supports access to data in Amazon DynamoDB. As with other types of data sources, for a report to use data from Amazon DynamoDB, you must create the following BIRT objects: ■ A data source that contains the information to connect to a DynamoDB database ■ A data set that specifies the data to retrieve Connecting to Amazon DynamoDB BIRT Designer Professional provides an ODA (Open Data Access) driver to connect to Amazon DynamoDB. You provide the regional endpoint to a web service to access an Amazon DynamoDB database, as well as, your access credentials. How to create an Amazon DynamoDB data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Amazon DynamoDB Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Amazon DynamoDB Data Source Profile, specify the properties to connect to your Amazon DynamoDB database instance. 1 In AWS Region Endpoint URL, specify the regional endpoint to the web service to which to make requests. The default value specifies the US-East region, which the AWS SDKs and console for Amazon DynamoDB reference. For a list of supported regions and endpoints, see the Amazon DynamoDB documentation. Chapter 6, Accessing data in a NoSQL database 87 2 In Amazon Web Services Access credentials, type your user credentials to log in to the system. Set Use Security Token Service to yes to use temporary security credentials. 3 In Amazon Web Services Client Configuration: ❏ In Connection Timeout, specify the number of milliseconds to wait for a connection. ❏ In Maximum Number of Retries, specify the number of times to attempt to connect until a successful connection is established. ❏ In Data Transfer Timeout, specify the number of milliseconds to wait for a response to a data request. Figure 6-1 shows an example of properties to connect to a database instance in Amazon DynamoDB. Figure 6-1 Connection properties for an Amazon DynamoDB database Specifying the data to retrieve from Amazon DynamoDB Once the report connects to an Amazon DynamoDB database, you create a data set and select the table from which to retrieve data. A data set can retrieve data from one table only. After selecting a table, you select the attributes from which to retrieve data. BIRT maps each selected attribute to a data set column. Because DynamoDB is a schema-less database in which each table item can contain a different set of attributes, you have the option of specifying the number of items to scan to compile the list of attributes. Scanning items in a table can be resource intensive. 88 Provisioning Data If all the table items contain the same attributes, specify one (the default) as the number of items to scan. How to specify what data to retrieve from an Amazon DynamoDB database 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Amazon DynamoDB data source to use. Data Set Type displays Amazon DynamoDB Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New Data Set, in Query, do the following: 1 In DynamoDB Table, select the table from which to retrieve data. 2 In Number of Items to Scan for Attributes, type the number of table items for which to search for attributes, then choose Scan. The fewer the number of items to scan, the faster the response. Available Attributes displays the attributes defined in the scanned items. If you do not see the attributes you expect and want, increase the number of items to scan, then choose Scan. 3 In Available Attributes, select the attribute or attributes whose data to retrieve. 4 If Searchable by Composite Key is available, you can filter the data to retrieve by searching for a hash key value, a range key value, or both. For information about this task, see “Filtering by a composite primary key,” later in this chapter. 5 In Advanced Settings, specify the following options: ❏ In AWS fetch size, type the maximum number of items to return in each web service call to the database, or select No fetch size limit. If you select the latter, each fetch operation returns the entire result set up to 1MB, the limit set by Amazon DynamoDB. The smaller the fetch size value, the faster the response time per web service call. The higher the fetch size, the fewer the calls to fetch data. ❏ Select the Eventually consistent reads option to maximize the read throughput. Deselect this option to request a strongly consistent read, which returns a result that reflects all writes that receive a successful response prior to the read. Chapter 6, Accessing data in a NoSQL database 89 ❏ In Separator character(s) in a multi-valued set column, specify the character to use to separate values in a multi-value set. By default, BIRT returns a multi-value set as a string in the following format: Value1|Value2|Value3 You can change the separator character to a comma, for example, to return results in the following format: Value1,Value2,Value3 Figure 6-2 shows an example of an Amazon DynamoDB query. Figure 6-2 Example of an Amazon DynamoDB query 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set. 5 Choose Preview Results to view the data rows returned by the data set. Filtering data Amazon DynamoDB is designed to store large volumes of data across multiple servers. Database users must design tables for efficient write and read operations. As discussed earlier, the required primary key is the only part of a table that is indexed, and it is also used to hash partition data across multiple servers. Amazon DynamoDB supports two types of primary keys: ■ 90 Hash primary key, which consists of one attribute. For example, a product catalog table can use ProductID as its primary key. Provisioning Data ■ Composite primary key, which consists of two attributes. The first attribute is a hash attribute and the second attribute is a range attribute. For example, a forum table can use ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. A table’s primary key type determines how you specify a filter condition, and how Amazon DynamoDB searches for data, as the following sections describe. Filtering by a composite primary key A composite key supports searching for a specific value in the hash attribute, and can include searching on the range attribute as well. Searching on both attributes narrows a search. When a composite primary key is defined for a table, Amazon DynamoDB uses its Query API to search on the key index only. This type of search is typically efficient. If you select a table that uses a composite primary key, the query page of the data set editor displays the Searchable by Composite Key option, as shown in Figure 6-3. This option is disabled if the selected table uses a hash primary key. Figure 6-3 Query page displaying the Searchable by Composite Key option In this example, the hash attribute is ForumName and the range attribute is Subject. You can select one or both of these attributes on which to filter. Each attribute you select creates a corresponding data set parameter, as shown in Figure 6-4. You must edit each data set parameter to specify the attribute value to search. Figure 6-5 shows searching for the value Amazon DynamoDB in the ForumName hash attribute. Chapter 6, Accessing data in a NoSQL database 91 Figure 6-4 Data set parameters associated with the selected attributes in the composite primary key Figure 6-5 Parameter value specified for an attribute in a composite primary key Filtering by an attribute You can filter data by any attribute selected in a data set. When filtering by an attribute that is not a primary key, Amazon DynamoDB uses its Scan API to scan the entire table, then filters out values to provide the desired result set. This type of search is not efficient, and slows down as a table grows. To filter by an attribute that is not a composite primary key, use the Filters page in the data set editor. Figure 6-6 shows an example of a filter condition created for the Product Catalog data set, where the BicycleType attribute is equal to Road. This filter condition uses the Equal to operator, which looks for an exact match. With this filter, a match is found if the BicycleType attribute contains the single value, Road. As mentioned earlier, an attribute, however, can contain a multi-value set, which BIRT returns in value1|value2|value3 format. If the BicycleType attribute contains a multi-value set, such as Road|Hybrid, there is no match. 92 Provisioning Data Figure 6-6 A filter condition specified for an attribute If you do not know whether a string attribute contains a single value or a multi-value set, do not use the Equal to operator in the filter condition. Instead, use the following operator: Contains substring, or value in a set To exclude a value when comparing values in a multi-value set, use the following operator: Absence of substring, or value in a set Figure 6-7 shows a filter condition where the Color attribute must contain the value Red. Figure 6-7 A filter condition specified for an attribute that contains a multi-value set Chapter 6, Accessing data in a NoSQL database 93 Figure 6-8 shows the data rows returned when the filter condition in Figure 6-7 is applied. The Color column in each row contains the value Red. Figure 6-8 Results of applying a filter condition Accessing data in Amazon Relational Database Service Amazon Relational Database Service (RDS) is a web service that supports the operations of a relational database in the cloud. It provides access to a MySQL or Oracle database and manages all administrative tasks, including hardware and software configuration and updates, automated backups, and system monitoring. Organizations use Amazon RDS to manage their database deployment in the cloud, paying only for the computing resources and storage that their database instance uses. By off-loading the costs and complexities of purchasing and maintaining hardware and database software, organizations can focus on their business applications. BIRT Designer Professional supports access to data in Amazon RDS. As with other types of data sources, for a report to use data from Amazon RDS, you must create the following BIRT objects: ■ A data source that contains the information to connect to a MySQL or Oracle database in Amazon RDS ■ A data set that specifies the data to retrieve Connecting to Amazon RDS BIRT Designer Professional provides two JDBC drivers to connect to MySQL and Oracle, the databases supported by Amazon RDS. You provide the URL to 94 Provisioning Data connect to the specific database instance on Amazon RDS, and your login credentials. The URL, or endpoint, is available in the DB Instance description in Amazon’s AWS (Amazon Web Services) Management Console. How to create a Amazon RDS data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Amazon RDS Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Amazon RDS Data Source Profile, specify the properties to connect to the MySQL or Oracle database instance on Amazon RDS. 1 In Driver Class, select either the MySQL or Oracle JDBC driver. 2 In Endpoint, type the URL to the database instance. 3 In User Name and Password, type the user credentials to log in to the system. Figure 6-9 shows an example of properties to connect to a MySQL database instance in Amazon RDS. Figure 6-9 Connection properties for a MySQL instance on Amazon RDS Specifying the data to retrieve from Amazon RDS Once the report connects to a MySQL or Oracle database in Amazon RDS, you create a data set and write a SQL query to specify what data rows to retrieve. How to specify what data to retrieve from a database in Amazon RDS 1 In Data Explorer, right-click Data Sets, then choose New Data Set. Chapter 6, Accessing data in a NoSQL database 95 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Amazon RDS data source to use. Data Set Type displays SQL Select Query. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New Data Set, in Query Text, type a SQL statement that indicates what data to retrieve. 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set. 5 Choose Preview Results to view the data rows returned by the data set. Accessing data in Cassandra Originally developed by Facebook to handle its massive data requirements, before being open sourced to the Apache Foundation, Apache Cassandra is an open source, distributed, NoSQL database system. These data sources are an alternative to the more traditional relational database management systems. The Cassandra system offers scalability, performance, tunability, and reliability on commodity hardware or in a cloud infrastructure. Cassandra also supports replicating data across multiple data centers for improved performance for end users and higher availability of very large amounts of data. Cassandra distributes data across all nodes in the same way, without a master node or named node, and Cassandra supports read/write anywhere in the cluster. Cassandra uses a row-oriented column schema, similar to Google BigTable. Within that schema are: ■ Keyspaces, which are comparable to databases in RDBMS ■ Column families, which are comparable to RDBMS tables, although they offer more flexibility ■ Rows and columns in column families, which can be indexed by key values To access Cassandra data using BIRT Report Designer, you must develop JavaScript code to call the Cassandra Hector API to retrieve data. More information about this requirement is available at the following URL: http://birtworld.blogspot.com/2012/09/birt-cassandra-andhector.html As with other types of data sources, for a report to use data from an Apache Cassandra database, you must create the following BIRT objects: ■ 96 A data source that contains the information to connect to a Cassandra database Provisioning Data ■ A data set that specifies the data to retrieve Connecting to a Cassandra scripted data source BIRT Report Designer provides a driver to connect to a Cassandra data source. You use scripts to specify which data to retrieve, as described in “Specifying the data to retrieve from Amazon DynamoDB,” later in this chapter. How to create a Cassandra data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information, then choose Finish: 1 Select Cassandra Scripted Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. Specifying the data to retrieve from a Cassandra scripted data source To specify what data rows to retrieve from a Cassandra database system, you write a query using CQL. In the BIRT Report Designer interface, you can specify the following types of clauses to create your CQL query: ■ open ■ describe ■ fetch ■ close ■ beforeOpen ■ beforeClose ■ onFetch ■ afterOpen ■ afterClose How to specify what data to retrieve from a Cassandra data source In this procedure, use the following open clause in a CQL query: importPackage(Packages.java.util); importPackage(Packages.me.prettyprint.cassandra.serializers); importPackage(Packages.me.prettyprint.cassandra.service); importPackage(Packages.me.prettyprint.hector.api); importPackage(Packages.me.prettyprint.hector.api.beans); importPackage(Packages.me.prettyprint.hector.api.factory); importPackage(Packages.me.prettyprint.hector.api.query); Chapter 6, Accessing data in a NoSQL database 97 importPackage(Packages.me.prettyprint.cassandra.model); var cluster = HFactory.getOrCreateCluster("Test Cluster",new CassandraHostConfigurator("192.168.218.246:9160")); var keyspace = HFactory.createKeyspace("users", cluster); var cqlQuery = new CqlQuery(keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); cqlQuery.setQuery("select * from User"); var resultCQL = cqlQuery.execute(); rowsIterator = resultCQL.get().iterator(); Also use the following fetch clause: if (rowsIterator.hasNext()) { var myrow = rowsIterator.next(); var cols = myrow.getColumnSlice().getColumns(); for( ii=0; ii < cols.size(); ii++ ){ row[cols.get(ii).getName()] = cols.get(ii).getValue(); } return true; }else{ return false; } 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Cassandra data source to use. Data Set Type displays HQL Select Query. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In Output columns, choose Add and add the necessary output columns to retrieve from the data source. For each output column, you can specify the following values on New Script Data Set Column, as shown in Figure 6-10: ■ Name ■ Type ■ Alias ■ Analysis type ■ Display Name ■ Display Name Key When finished specifying the data set column properties, choose OK. 98 Provisioning Data Figure 6-10 Adding an output column The specified output columns appear in New Data Set, as shown in Figure 6-11. In this example, the output columns are first, last, and age. Figure 6-11 Specifying output columns to retrieve 4 Choose Finish. 5 In Script, select a type of clause from the drop-down menu, and specify the necessary CQL clause. For example: 1 In Script, select open, and specify a CQL clause similar to the following one, as shown in Figure 6-12: importPackage(Packages.java.util); importPackage(Packages.me.prettyprint.cassandra.serializers); importPackage(Packages.me.prettyprint.cassandra.service); importPackage(Packages.me.prettyprint.hector.api); importPackage(Packages.me.prettyprint.hector.api.beans); importPackage(Packages.me.prettyprint.hector.api.factory); importPackage(Packages.me.prettyprint.hector.api.query); importPackage(Packages.me.prettyprint.cassandra.model); var cluster = HFactory.getOrCreateCluster("Test Cluster",new CassandraHostConfigurator("192.168.218.246:9160")); Chapter 6, Accessing data in a NoSQL database 99 var keyspace = HFactory.createKeyspace("users", cluster); var cqlQuery = new CqlQuery(keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); cqlQuery.setQuery("select * from User"); var resultCQL = cqlQuery.execute(); rowsIterator = resultCQL.get().iterator(); Figure 6-12 Specifying the open clause 2 In Script, select fetch, and specify a CQL clause similar to the following one, as shown in Figure 6-13: if (rowsIterator.hasNext()) { var myrow = rowsIterator.next(); var cols = myrow.getColumnSlice().getColumns(); for( ii=0; ii < cols.size(); ii++ ){ row[cols.get(ii).getName()] = cols.get(ii).getValue(); } return true; }else{ return false; } Figure 6-13 100 Provisioning Data Specifying the fetch clause 3 Choose Preview Results to view the data rows returned by the data set. Accessing data in Cloudera Cloudera is the open source implementation of Google’s BigTable, a distributed data storage system. Cloudera runs on top of either Hadoop’s Distributed File System (HDFS) or Amazon’s Simple Storage Service (S3). At a very basic level, Cloudera is a map, an abstract data type that is composed of keys and values. An additional dimension of data is also available. For example, if the key/value pair identifies a cell value, as in a table, an additional dimension can identify time, enabling users to retrieve a particular data value from a particular time. Cloudera stores the key/value pairs in alphabetical order, further available by time, where the highest time values are the most recent. BIRT Designer Professional supports access to Cloudera data through Hive, which is a data warehouse infrastructure built on top of Hadoop. Hive facilitates data summarization, queries, and analysis. It provides a mechanism for structuring large data sets and querying the data using a SQL-like language called Hive Query Language (HQL). By using Hive to access data, you can write HQL queries to specify the data to retrieve. As with other types of data sources, for a report to use data from a Cloudera system, you must create the following BIRT objects: ■ A data source that contains the information to connect to a Cloudera system ■ A data set that specifies the data to retrieve Connecting to a Cloudera database BIRT Designer Professional provides a JDBC driver to connect to a Cloudera system through Hive. Similar to connecting to a database, you provide the URL to the Hive machine and your user credentials. How to create a Cloudera data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Cloudera Hive Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Hive Data Source Profile, specify the properties to connect to the Cloudera system. Chapter 6, Accessing data in a NoSQL database 101 1 In Database URL, type the URL to the Cloudera system. 2 In User Name and Password, type the user credentials to connect to the system. 3 In Add File Statement, optionally type one or more Add File statements to add script files to the Hadoop distributed cache. Separate each Add File statement with a semicolon. Figure 6-14 shows an example of properties to connect to a Cloudera Hive data source. Figure 6-14 Connection properties for a Cloudera Hive data source Specifying the data to retrieve from a Cloudera system HQL supports many of the same keywords as SQL, for example, SELECT, WHERE, GROUP BY, ORDER BY, JOIN, and UNION. The following is an example of a very simple HQL query that selects all from table_1: SELECT * FROM table_1 How to specify what data to retrieve from a Cloudera system 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Cloudera Hive data source to use. Data Set Type displays HQL Select Query. 102 Provisioning Data 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In HQL Query, in Query text, type a HQL statement that indicates what data to retrieve. Figure 6-15 shows an example of an HQL query specified in the data set editor. Figure 6-15 Data set editor displaying an HQL query 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set, as shown in Figure 6-16. Chapter 6, Accessing data in a NoSQL database 103 Figure 6-16 Data set editor displaying the output columns 5 Choose Preview Results to view the data rows returned by the data set, as shown in Figure 6-17. When finished, choose OK. Figure 6-17 Previewing query results Accessing data in Hadoop With data storage requirements approaching several petabytes, relational databases no longer meet the needs of many organizations. Facebook, for example, analyzes 15 terabytes of log data each day. To store and process vast amounts of data, organizations use “big data” systems such as Hadoop. An open source software framework designed for scalable, distributed computing, Hadoop spreads and manages data on clusters of servers and coordinates work among them. To achieve reliability and efficiency in distributed processing, Hadoop uses a MapReduce programming model, which uses map and reduce operations to 104 Provisioning Data divide data-intensive tasks, such as data searches or data aggregation, into discrete tasks that can be done in parallel across clusters of servers. The map phase occurs when each discrete task is distributed, or mapped, to all the servers. The reduce phase occurs when the intermediate results are merged, or reduced, into one result set. BIRT Report Designer supports access to Hadoop data through Hive, which is a data warehouse infrastructure built on top of Hadoop. Hive facilitates data summarization, queries, and analysis. It provides a mechanism for structuring large data sets and querying the data using a SQL-like language called Hive Query Language (HQL). By using Hive to access data, you can write HQL queries, instead of MapReduce functions, to specify the data to retrieve. As with other types of data sources, for a report to use data from a Hadoop system, you must create the following BIRT objects: ■ A data source that contains the information to connect to a Hive system ■ A data set that specifies the data to retrieve Connecting to a Hadoop system BIRT Report Designer provides a JDBC driver to connect to a Hadoop system through Hive. Similar to connecting to a database, you provide the URL to the Hive machine and your user credentials. You also have the option to pass MapReduce scripts to Hadoop. These scripts can be written in any programming language. How to create a Hive data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Hive Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Hive Data Source Profile, specify the properties to connect to the Hive system. 1 In Database URL, type the URL to the Hive system. 2 In User Name and Password, type the user credentials to connect to the system. 3 In Add File Statement, optionally type one or more Add File statements to add script files to the Hadoop distributed cache, as shown in the following example. Separate each Add File statement with a semicolon. add file /usr/local/hive/rating_mapper.py; add file /usr/local/hive/weekday_mapper.py; Chapter 6, Accessing data in a NoSQL database 105 Figure 6-18 shows an example of properties to connect to a Hive system. Figure 6-18 Connection properties for a Hive system Specifying the data to retrieve from a Hadoop system To specify what data rows to retrieve from a Hadoop system running Hive, you write a query using HQL. As mentioned earlier, HQL is similar to SQL. HQL supports many of the same keywords as SQL, for example, SELECT, WHERE, GROUP BY, ORDER BY, JOIN, and UNION. Hive transforms HQL statements into MapReduce jobs that Hadoop uses to perform and manage parallel processing across the clusters of servers. You can embed your own MapReduce scripts in the query by using the TRANSFORM clause. You make these scripts available to Hadoop through the Add File property when you configure the connection properties, as described in the previous topic. The following is an example of a HQL query that uses the TRANSFORM clause: SELECT TRANSFORM (userid, movieid, rating, unixtime) USING 'python weekday_mapper.py' AS (userid, movieid, rating, weekday) FROM u_data How to specify what data to retrieve from a Hadoop system 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Hive data source to use. Data Set Type displays HQL Select Query. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 106 Provisioning Data 3 In HQL Query, in Query Text, type a HQL statement that indicates what data to retrieve. Figure 6-19 shows an example of an HQL query specified in the data set editor. Figure 6-19 Data set editor displaying an HQL query 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set, as shown in Figure 6-20. Figure 6-20 Data set editor displaying the output columns 5 Choose Preview Results to view the data rows returned by the data set. Chapter 6, Accessing data in a NoSQL database 107 Accessing data in HBase HBase is the open source implementation of Google’s BigTable, a distributed data storage system. HBase runs on top of either Hadoop's Distributed File System (HDFS) or Amazon's Simple Storage Service (S3). At a very basic level, HBase is a map, an abstract data type that is composed of keys and values. An additional dimension of data is also available. For example, if the key/value pair identifies a cell value, as in a table, an additional dimension can identify time, enabling users to retrieve a particular data value from a particular time. HBase stores the key/value pairs in alphabetical order, further available by time, where the highest time values are the most recent. BIRT Designer Professional supports access to HBase data through Hive, which is a data warehouse infrastructure built on top of Hadoop. Hive facilitates data summarization, queries, and analysis. It provides a mechanism for structuring large data sets and querying the data using a SQL-like language called Hive Query Language (HQL). By using Hive to access data, you can write HQL queries to specify the data to retrieve. As with other types of data sources, for a report to use data from an HBase system, you must create the following BIRT objects: ■ A data source that contains the information to connect to a Hive system ■ A data set that specifies the data to retrieve Connecting to an HBase database BIRT Designer Professional provides a JDBC driver to connect to an HBase system through Hive. Similar to connecting to a database, you provide the URL to the Hive machine and your user credentials. How to create an HBase data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select HBase Hive Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Hive Data Source Profile, specify the properties to connect to the HBase system. 1 In Database URL, type the URL to the HBase system. 108 Provisioning Data 2 In User Name and Password, type the credentials to connect to the system. 3 In Add File Statement, optionally type one or more Add File statements to add script files to the Hadoop distributed cache. Separate each Add File statement with a semicolon. Figure 6-21 shows an example of properties to connect to an HBase Hive data source. Figure 6-21 Connection properties for an HBase Hive data source Specifying the data to retrieve from an HBase system HQL supports many of the same keywords as SQL, for example, SELECT, WHERE, GROUP BY, ORDER BY, JOIN, and UNION. The following is an example of a very simple HQL query: SELECT * FROM hbase_table_1 How to specify what data to retrieve from an HBase system 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the HBase Hive data source to use. Data Set Type displays HQL Select Query. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In HQL Query, in Query text, type a HQL statement that indicates what data to retrieve. Figure 6-22 shows an example of an HQL query specified in the data set editor. Chapter 6, Accessing data in a NoSQL database 109 Figure 6-22 Data set editor displaying an HQL query 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set, as shown in Figure 6-23. Figure 6-23 110 Provisioning Data Data set editor displaying the output columns 5 Choose Preview Results to view the data rows returned by the data set, as shown in Figure 6-24. When finished, choose OK. Figure 6-24 Previewing query results Accessing data in MongoDB MongoDB is an open source NoSQL database. In a MongoDB instance, an element of data is called a document, and documents are stored in collections. The document structure follows the JavaScript Object Notation (JSON) format. JSON is derived from the JavaScript scripting language for representing simple data structures and associative arrays called objects. A document is an associative array (also called a map, dictionary, hash table or hash). The structure of a MongoDB instance is quite different from that of a relational database. In a relational database, data is organized in tables. Each table consists of records (rows), and each record consists of fields. Within each table, every record has the same fields in the same order. All the instances of a field taken together are called a column. In a MongoDB instance, collections are like tables, and documents are like records. The difference is that there are no fixed schemas, so any document in a collection can have different fields from the other documents.For example, Figure 6-25 shows a table in a relational database with columns Last Name, First Name, and Date of Birth. Figure 6-25 Table in a relational database The following code shows a MongoDB collection with similar data. The collection consists of a series of key-value pairs. The key is the name of the field and the Chapter 6, Accessing data in a NoSQL database 111 value is the field’s content. The key and value are separated by a colon (:). Notice that the second document contains Address and City fields as well as Last Name, First Name, and Date of Birth. { "_id": ObjectId("4efa8d2b7d284dad101e4bc9"), "Last Name": "DUMONT", "First Name": "Jean", "Date of Birth": "01-22-1963" }, { "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "PELLERIN", "First Name": "Franck", "Date of Birth": "09-19-1983", "Address": "1 chemin des Loges", "City": "VERSAILLES" } A value can be a number; a string; true or false; binary data such as an image; an array of values (each of which can be of a different type); or a subordinate document. In the following code, the Address field contains a subordinate document with two fields, Street and City. { "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "PELLERIN", "First Name": "Franck", "Date of Birth": "09-19-1983", "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567", "verified": false } ], "Address": { "Street": "1 chemin des Loges", "City": "VERSAILLES" }, "Months at Present Address": 37 } 112 Provisioning Data MongoDB features full index support, including secondary and compound indexes. Indexes are specified per collection. There is a document-based query language that leverages these indexes. MongoDB also provides atomic update modifiers to keep code contention-free. Clustered setups are supported, including easy replication for high availability, as well as auto-sharding for write-scaling and large data sets. BIRT supports access to data in MongoDB. As with other types of data sources, for a report to use data from MongoDB, you must create the following BIRT objects: ■ A data source that contains the information to connect to a MongoDB database ■ A data set that specifies the data to retrieve Connecting to MongoDB BIRT provides an ODA (Open Data Access) driver to connect to MongoDB. This driver supports MongoDB up to release 2.4. You provide the database URI to access a MongoDB release 2.4 or earlier database. How to create a MongoDB data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select MongoDB Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New MongoDB Data Source Profile, specify the properties to connect to your MongoDB database instance. Choose Mongo Database URI Elements to specify the elements of the URI individually, or choose Mongo Database URI to type the URI string. ■ If you choose Mongo Database URI Elements, provide values for the following properties: ❏ Server Host: The IP address of the database server to which to connect. ❏ Server Port: The port on which the database server listens for requests. The default value is 27017. ❏ Database Name: The name of the database from which to retrieve data. ❏ User Name and Password: User authentication is required only when the MongoDB process is run in a secure mode. Username and password are optional when MongoDB is run without authentication in a trusted environment. Chapter 6, Accessing data in a NoSQL database 113 ■ If you choose Mongo Database URI, type the URI string in the following format: mongodb://[username:password@]host1[:port1][,host2[:port2], ...[,hostN[:portN]]][/[database][?options]] 4 Optionally, enable one or both of the following properties: ■ Keep Socket Alive: When enabled, a MongoDB connection is kept alive through firewalls. ■ Consistent Request Session: When enabled, all database operations within an ODA connection use the same MongoDB connection, ensuring that they occur in a fixed order with predictable results. Figure 6-26 shows an example of properties to connect to a database instance in MongoDB. Figure 6-26 Connection properties for a MongoDB database 5 Choose Test Connection. If the connection is successful, choose Finish. If the connection is unsuccessful, you must provide different values for the connection properties. Specifying the data to retrieve from MongoDB Once the report connects to a MongoDB database, you create a data set and select the collection from which to retrieve data. A data set can retrieve data from one collection only. 114 Provisioning Data After selecting a collection, you select the fields from which to retrieve data. BIRT maps each selected field to a data set column. Because MongoDB is a schema-less database in which each document can contain a different set of fields, you have the option of specifying the number of documents to scan to compile the list of fields. Scanning documents can be resource intensive. If all the documents contain the same fields, specify one (the default) as the number of documents to scan. How to specify what data to retrieve from a MongoDB database 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the MongoDB data source to use. Data Set Type displays MongoDB Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New MongoDB Data Set, in Query, do the following: 1 In MongoDB Collection, select the collection from which to retrieve data. Not required if you enter a Run Database Command expression. 2 In Command Operation, in Command type, optionally select a command type. Choose Expression and type an expression in Command Expression Builder. The default command type is FindQuery. 3 In Number of Documents to search for fields, type the number of documents for which to search for fields, then choose Find Fields. The fewer the number of documents to scan, the faster the response. Available Fields displays the fields defined in the scanned documents. The letter D preceding a field name indicates that the field is a subordinate document. Square brackets ([]) following a field name indicate that the field is an array. If you do not see the fields you expect and want, increase the number of documents to scan, then choose Find Fields. 4 In Available Fields, select the field or fields whose data to retrieve. To move a single field from Available Fields to Selected Fields, select the field and choose >. You can select a field at any level, for example a subordinate document field, not just a leaf field (lowest level). To move all fields in a node from Available Fields to Selected Fields, select the node and choose >>. Figure 6-27 shows an example of a MongoDB query. Chapter 6, Accessing data in a NoSQL database 115 Figure 6-27 Example of a MongoDB query 4 Choose Finish to save the data set. Edit Data Set displays the output columns, and provides options for editing the data set. 5 Choose Preview Results to view the data rows returned by the data set. Mapping selected fields Available Fields, shown in Figure 6-2, displays fields and subordinate documents found by searching documents in a MongoDB collection. A subordinate 116 Provisioning Data document can, in turn, contain fields and other subordinate documents. For example, in Figure 6-2, Available Fields shows the fields _id and name and the subordinate document nestedDocs[]. The subordinate document nestedDocs[] contains the field text and the subordinate document follower. In Selected Fields, fields in subordinate documents are identified using dot notation, for example nestedDocs[].text and nestedDocs[].follower.followed. When a subordinate document and its fields are both selected, only the selected fields are included in the subordinate document. A field can be of type Array, and can contain multiple scalar values or subordinate documents. By default, scalar values in an array field are concatenated into a single String value in JSON expression format in a result set column, for example [1.0, 2.0, 3.0]. The field values in each subordinate document in an array field are also, by default, concatenated into a single String value in a result set column, for example: [{"followed":true,"ratings":[1.0,2.0,3.0]},{"followed":true, "ratings":[1.0,2.0,3.0]}] You can specify one or more array projection operators, such as $slice and $elemMatch, to project elements from an array in the result set. For example, you can use $slice to project elements from the zipcodes collection: { city: 1, loc: {$slice : 1}, state : true, _id : false } Specify array projection operators in Selected Fields on the Property Binding page in the Data Set Editor. Make sure that the field names projected in the result set match the output column names in the BIRT data set. Alternatively, you can flatten subordinate documents by setting the Flatten nested collections runtime connection property to true. Table 6-1 shows the native data type to ODA data type mapping for MongoDB. Table 6-1 Native data type to ODA data type mapping for MongoDB Native data type Native data type value ODA data type Number 1 Double String 2 String Object 3 JavaObject Array 4 JavaObject Binary 5 Blob Object id 7 String Boolean 8 Boolean Date 9 Date (continues) Chapter 6, Accessing data in a NoSQL database 117 Table 6-1 Native data type to ODA data type mapping for MongoDB (continued) Native data type Native data type value ODA data type Null 10 String Regular expression 11 String JavaScript 13 String Symbol 14 String JavaScript with scope 15 String 32-bit integer 16 Integer Timestamp 17 Timestamp 64-bit integer 18 String Min key 255 String Max key 127 String Writing expressions When you create a MongoDB data set query, you can create Mongo JSON expressions for the following: ■ Run Database commands ■ Aggregate commands ■ MapReduce commands ■ Query expressions ■ Sort expressions You can type the expression or import it from a text file. You can then validate the syntax of the expression. You can override an expression at runtime by setting the appropriate property or properties on the Property Binding page in the Data Set Editor. Embedded input parameter markers are not supported. Writing expressions for Run Database commands Run Database Command executes a MongoDB command-driven query. The command result document is exposed as available fields that can be selected and bound to a data set’s columns. These commands are run on the connected database; any specified MongoDB collection is ignored. Run Database Command supports read-only operation. The following MongoDB database commands are supported. 118 Provisioning Data ■ buildInfo ■ collStats ■ ■ ■ ■ ■ ■ connPoolStats count cursorInfo dataSize dbStats distinct ■ eval with nolock: true * ■ geoNear ■ geoSearch ■ getLastError ■ getLog ■ getPrevError ■ group ■ isMaster ■ isdbgrid ■ listCommands ■ listDatabases ■ listShards ■ ping ■ printShardingStatus ■ replSetGetStatus ■ serverStatus * eval with nolock: true is used to evaluate a function written in JavaScript at the database server. eval’s default behavior of taking a global write lock is not allowed. For more information about MongoDB database commands, see: http://docs.mongodb.org/manual/reference/command/ Writing expressions for Aggregate commands When writing expressions for Aggregate commands, including array markers at the beginning and end of a pipeline expression is optional. A sort value must be 1 or -1. For more information about Aggregate commands, see: http://docs.mongodb.org/manual/reference/aggregation/ For Aggregation Framework examples, see: http://docs.mongodb.org/manual/tutorial/aggregation-examples/ Writing query expressions A query expression is optional, and is only applicable to a data set that has either a FindQuery or MapReduce (with output to a collection) command operation. You must specify a MongoDB JSON expression to describe a custom $query operation to filter on the specified collection, or an output collection returned by the MapReduce command. A query expression is ignored when a MapReduce command outputs to inline using the syntax out: { inline: 1 }. In this case, you must specify the query parameters directly in the MapReduce command expression. For more information about query expressions and examples, see: http://docs.mongodb.org/manual/reference/operator/query/ Chapter 6, Accessing data in a NoSQL database 119 Writing sort expressions A sort expression is optional, and is only applicable to a data set that has either a FindQuery or MapReduce (with output to a collection) command operation. You must specify a MongoDB JSON expression to describe a custom sort operation on the result cursor. A sort value can be a positive value (>= 0 or true/false) for ascending order, or a negative value for descending order. Take note of the following: ■ If Flatten nested collections is enabled, a sort expression is applied to the result set before rows are flattened. ■ A sort expression is ignored when a MapReduce command outputs to inline using the syntax out: { inline: 1 }. In this case, you must specify the sort parameters directly in the MapReduce command expression. ■ Make sure that an index exists on the sort attribute, or set a runtime limit on maxRows. For more information about sort expressions and examples, see: http://docs.mongodb.org/manual/reference/method/cursor.sort /#cursor.sort Setting runtime connection properties You can set runtime connection properties for a MongoDB database connection in Advanced Data Set Run-time Properties or on the Property Binding page in the Data Set Editor. To display Advanced Data Set Run-time Properties, choose Advanced Settings on the Query page of the Data Set Editor. The following runtime connection properties are available: 120 ■ Maximum result objects in a batch The maximum number of result objects fetched in one batch by the MongoDB cursor. A MongoDB cursor typically fetches a batch of result objects and stores them locally. This value can be adjusted to optimize performance and limit data transfer. The smaller the value, the faster the response time. The higher the value, the fewer the calls to fetch data. The default value is 100. A zero or negative value is ignored. ■ Maximum documents to search for metadata The number of documents to process to obtain query result set metadata at runtime. The default value is 10. This value is different from the value in Number of Documents to search for fields on the Query page, which is only applicable at design time. ■ Maximum documents to skip Skips the first n documents in the result set. The default value is 0. For example, a collection contains 100 documents. If you set this property to 25, the result set contains documents 26 through 100. Provisioning Data ■ Index hint expression Contains a string value in Mongo JSON hint expression syntax that lists one or more index keys. A query uses index keys for query optimization. ■ Query cursor read preference The read preference of a query cursor operation. ■ Mode Set to primary, primaryPreferred, secondary, secondaryPreferred, or nearest. The default is primary. ■ Tag set(s) Sets of required tags configured for a member node. Tags are not applicable for primary read preference. You can specify alternate tag sets as documents in an array, for example: [ { "datacenter" : "CA", "rack" : "1" }, {"region" : "US_West" } ] ■ No cursor time out The MongoDB server normally times out idle cursors after an inactivity period of 10 minutes to prevent excess memory use. Set to true to prevent this. The default is false. ■ Allows partial results When used with sharding on a mongos server, set to true to allow partial results if any shards are down or missing from the cluster. If set to false, the mongos server returns an error. The default is true. ■ Flatten nested collections Set to true to flatten multi-level fields in an array or collection into multiple data set rows. This setting also applies to an array of scalar values. The default is false. By default, field values are concatenated into a single string value in a result set column. For example, the following result set contains three records with Flatten nested collections set to false. [{"followed":true,"ratings":[1.1,2.1,3.1]},{"followed":false, "ratings":[1.1,2.1,3.1]}] [{"followed":true,"ratings":[1.2,2.2,3.2]},{"followed":false, "ratings":[1.2,2.2,3.2]}] [{"followed":true,"ratings":[1.3,2.3,3.3]},{"followed":false, "ratings":[1.3,2.3,3.3]}] With Flatten nested collections set to true, the result set contains six records. {"followed":true,"ratings":[1.1,2.1,3.1]} {"followed":false,"ratings":[1.1,2.1,3.1]} {"followed":true,"ratings":[1.2,2.2,3.2]} {"followed":false,"ratings":[1.2,2.2,3.2]} {"followed":true,"ratings":[1.3,2.3,3.3]} {"followed":false,"ratings":[1.3,2.3,3.3]} Chapter 6, Accessing data in a NoSQL database 121 Data row flattening applies to only one nested collection in a data set. If you select multiple nested documents, the first nested collection field specified in the Selected fields property on the Property Binding page in the Data Set Editor is flattened. If no nested collection is selected, a top-level field that is an array of scalar values is flattened. Accessing data in Salesforce.com Salesforce.com is a company specializing in software-as-a-service (SaaS), providing Customer Relationship Management (CRM) applications that run in the cloud. Organizations use these CRM applications to manage all aspects of their business activities with customers, including tracking the status of contracts and sales opportunities, accessing customer profiles and account histories, and communicating with decision makers. BIRT Designer Professional supports access to data in Salesforce.com, enabling you to integrate this business data in your reports. As with other types of data sources, for a report to use data from Salesforce.com, you must create the following BIRT objects: ■ A data source that contains the information to connect to a database in Salesforce.com ■ A data set that specifies the data to retrieve Connecting to Salesforce.com BIRT Designer Professional uses a DataDirect JDBC driver to connect to Salesforce.com. This third-party driver supports access to Salesforce objects through JDBC and SQL. You provide the URL to connect to a Salesforce instance, and your login credentials. You can also set performance-tuning properties to control the data communications between BIRT and Salesforce.com. The DataDirect JDBC driver maps Salesforce objects and fields to tables and columns the first time it connects to a Salesforce instance. The driver maps both standard and custom objects and includes any relationships defined between objects. The driver uses a local embedded database to instantiate the mapping of the remote data source objects to tables and the metadata associated with the tables. The embedded database is created in the directory from which the application is run and uses the user ID specified for the connection as the name of the database. The driver creates one database per user. Whenever a new Salesforce object is created, the local database must be updated so that the new object is visible. 122 Provisioning Data To force the creation of a new local database, add the parameter CreateDB=forceNew to the connection URL, for example: jdbc:actuate:sforce://login.salesforce.com;CreateDB=forceNew Alternatively, use the parameter RefreshSchema, for example: jdbc:actuate:sforce://login.salesforce.com;RefreshSchema Using either of these connection parameters requires additional processing to establish the connection. For more information about these parameters, refer to the DataDirect documentation: http://media.datadirect.com/download/docs/jdbc/alljdbc/wwhelp /wwhimpl/js/html/wwhelp.htm#href=userguide /rfi1359985908931.html How to create a Salesforce.com data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Actuate JDBC Salesforce.com Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Actuate JDBC Salesforce.com Data Source Profile, specify the properties to connect to a Salesforce instance, and optionally, change the performance-tuning settings. 1 In Database URL, type the URL to a Salesforce instance. 2 In User Name, type the user name to log in to the system. 3 In Password, type the user password and security token to log in to the system, for example, myPasswordXXXXXXXXX, where myPassword is the password and XXXXXXXXX is the security token. A user’s security token is assigned by a Salesforce.com administrator. A user can change his or her security token in Saleforce.com’s Personal Setup tool. 4 In Database Name, do one of the following: ❏ Deselect Use default, then type a filename prefix, or a path and a prefix. The driver uses this value to create or locate the set of Salesforce embedded database files. For example, if Database Name is set to AcmeData, the database files that are created or loaded have the format, AcmeData.. If you specify a path and a prefix, such as C:\Resources\Data\AcmeData, the driver creates or looks for database files AcmeData. in the C:\Resources\Data folder. Chapter 6, Accessing data in a NoSQL database 123 ❏ Select Use default. This option uses BIRT’s resource folder as the path and the User Name value as the filename prefix. 5 In Connection Retry Count, specify the number of times to attempt to connect until a successful connection is established. 6 In Connection Retry Delay, specify the number of seconds to wait between each connection attempt. 7 In Fetch Size, specify the number of data rows that the driver processes before returning data to the application. A small fetch size can improve the initial response time of the query. A large fetch size can improve overall fetch times at the cost of additional memory. 8 In Web Service Fetch Size, specify the number of data rows that the driver fetches for each JDBC call. 9 In Web Service Retry Count, specify the number of times the driver retries a SELECT query that has timed out. 10 In Web Service Timeout, specify the number of seconds that the driver waits for a response to a web service request. Figure 6-28 shows an example of properties to connect to a Salesforce.com instance. Figure 6-28 124 Provisioning Data Connection properties to a Salesforce.com instance Specifying the data to retrieve from Salesforce.com Once the report connects to a Salesforce.com instance, you create a data set and write a SQL query to specify what data rows to retrieve. The driver translates the SQL query to a SOQL query before sending it to Salesforce.com for execution. SOQL (Salesforce Object Query Language) is Salesforce.com’s query language for accessing data, and is similar to SQL. You cannot write a SOQL query in the data set editor. How to specify what data to retrieve from a database in Salesforce.com 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Actuate JDBC Salesforce.com data source to use. Data Set Type displays Select Query. 2 In Data Set Name, type a name for the data set. 3 Choose Next. In New Data Set, in Query Text, type a SQL statement that indicates what data to retrieve. Figure 6-29 shows an example of a query specified in the data set editor. Figure 6-29 Data set editor displaying a SELECT query Chapter 6, Accessing data in a NoSQL database 125 3 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set. 4 Choose Preview Results to view the data rows returned by the data set. 126 Provisioning Data Chapter 7 Accessing data in files Chapter 7 This chapter contains the following topics: ■ About file data sources ■ Accessing data in an Excel spreadsheet ■ Accessing data in a text file ■ Accessing data in an XML file ■ Accessing data in a BIRT report document ■ Accessing data in an e.report Chapter 7, Accessing data in files 127 About file data sources A vast amount of corporate data is stored in relational databases and, increasingly, in NoSQL databases. In addition, much corporate data is stored in files such as Excel spreadsheets, text files (also known as flat files), and XML files. BIRT Report Designer supports accessing data in these types of files. BIRT Designer Professional also supports accessing data in BIRT report documents and e.reports. e.reports are reports designed in e.Report Designer Professional, Actuate’s legacy report designer. Accessing data in an Excel spreadsheet One of the most recognizable document formats in use in business settings today is an Excel spreadsheet. Like other spreadsheet programs, Excel presents data in a grid of rows and columns, which intersect at cells. Cell data can include numbers, text, and formulas. A spreadsheet also can include other features, such as graphs. BIRT Designer includes an out-of-the-box data connector to support using an Excel spreadsheet (XLS or XLSX) as a data source for a report design. As with other types of data sources, for a report to use data from Excel, you must create the following BIRT objects: ■ A data source that contains the information to connect to an Excel data source ■ A data set that specifies the data to retrieve Supported data types BIRT supports the following Excel data types: 128 ■ INT ■ DOUBLE ■ STRING ■ DATE ■ TIME ■ DATETIME ■ TIMESTAMP ■ BIGDECIMAL ■ BOOLEAN Provisioning Data Connecting to an Excel workbook BIRT Report Designer provides a connector to connect to Excel workbooks that are structured so that the column names appear in the first row. You can extract data from an entire workbook or an individual worksheet. How to create an Excel data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select Excel Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. 3 In New Excel Data Source Profile, specify the workbook to use as the data source. 1 Use the drop-down menu for Browse to select either Relative path or Absolute path, as shown in Figure 7-1. Figure 7-1 Selecting the type of path to specify 2 In Select file, specify the Excel file to use as the data source. 3 Select additional options, as necessary: ❏ Use first line as column name indicator. ❏ Use second line as data type indicator. Figure 7-2 shows an example of properties to connect to an Excel data source. Figure 7-2 Connection properties for an Excel data source Chapter 7, Accessing data in files 129 Specifying the data to retrieve from an Excel workbook Once the report connects to an Excel data source, you create a data set to specify which data to retrieve. How to specify what data to retrieve from an Excel workbook 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the Excel workbook. Data Set Type displays Excel Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New Excel Data Set, specify the data to retrieve: 1 In Select worksheet, select the worksheet from which to retrieve data. 2 In the left pane, select the columns to include in the report design, then choose the right arrow. The columns appear in the right pane, as shown in Figure 7-3. Figure 7-3 Specifying data to retrieve from an Excel data source 3 To rearrange the columns, select a column in the right pane, and choose the up or down arrow to move the column in the list, as shown in Figure 7-4. 130 Provisioning Data Figure 7-4 Rearranging columns in the Excel data set 4 To select columns from another worksheet, repeat substeps 1–3. 4 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set, as shown in Figure 7-5. Figure 7-5 Viewing output columns for an Excel data set 5 Choose Preview Results to view the data rows returned by the data set. Accessing data in a text file A BIRT report can access data from a text file, or flat file, that conforms to a defined and consistent structure. These files are typically generated by business Chapter 7, Accessing data in files 131 systems and applications that create logs. These files can also be spreadsheets saved as comma separated values (CSV) files. Before using text file data in a report, make sure the file uses a valid structure, and if necessary, edit the text accordingly. Text file structure A text file used for report data must have the following structure: ■ The first line of the text file can contain the names of the columns, separated by commas. If the first line does not contain column names, BIRT assigns default names, such as Column_1, Column_2, and so on. ■ The second line of the file can specify the data types of the columns. See Table 7-1 for a list of supported data types. If you use the second line to specify data types, list the data types in the same order as the columns, and separate the data types with a comma, semicolon, tab, or pipe. ■ The remaining lines in the file must contain values for the columns. The values can be separated by a comma, semicolon, tab, or pipe. ■ Each line must contain the same number of fields. ■ The file cannot include empty lines between records. ■ Each record must occupy a separate line, delimited by a line break, such as CRLF or LF. The last record in the file can either include or omit an ending line break. ■ Data in a field can be surrounded by more than one set of quotation marks. Quotation marks are required only if the data contains one or more commas within a field. A field can enclose single quotation marks and commas with double quotation marks, such as: "He said, 'Yes, I do.'" ■ If a field without content has zero or more spaces, the field is treated as NULL and evaluated as NULL in comparison operations. ■ The file name and extension can be any name that is valid for your operating system. Although TXT and CSV are typical, you do not have to use either as the file extension. The following example shows a valid sample text file. The text file has two lines of metadata and three lines of data. The first line lists the column names, and the second line lists the data types. FamilyName,GivenName,AccountID,AccountType,Created STRING,STRING,INT,STRING,TIMESTAMP "Smith","Mark",254378,"Monthly",01/31/2003 09:59:59 AM "Johnson","Carol",255879,"Monthly",09/30/2004 03:59:59 PM "Pitt","Joseph",255932,,10/01/2005 10:32:04 AM 132 Provisioning Data Text file data types Table 7-1 lists and provides information about the abbreviations you use for the data types. Table 7-1 Supported data types in flat files Abbreviation Data type Examples BIGDECIMAL java.sql.Types.NUMERIC DATE java.sql.Types.DATE DOUBLE java.sql.Types.DOUBLE INT java.sql.Types.INTEGER STRING java.sql.Types.VARCHAR TIME java.sql.Types.TIME hh:mm:ss Examples: 12:59:59 AM 12:59:59 pm TIMESTAMP java.sql.Types.TIMESTAMP YYYY-MM-DD hh:mm:ss.nnnnnn YYYY-MM-DD or MM/DD/YYYY Examples: 2003-01-31 01/31/2003 Creating a flat file data source When creating a flat file data source in BIRT, you specify its property values, such as the file location and the character set that the text file uses. You should also know how the text file is structured—whether the file uses commas, semicolons, tabs, or pipes to separate values, and whether the file specifies column names or data types. How to specify the connection information for accessing a text file 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, supply the following information: 1 Select Flat File Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. New Flat File Data Source Profile appears, as shown in Figure 7-6. Chapter 7, Accessing data in files 133 Figure 7-6 Selecting flat file directory and character set 3 Specify the following connection information for the text file: 1 In Select folder, type the location of the folder, or choose Browse to navigate to and select the folder. 2 In Select charset, select the character set that the text files in this folder use. 3 In Select flatfile style, select either CSV, SSV, PSV, or TSV for a file that uses comma-separated values, semicolon-separated values, pipe-separated values, or tab-separated values, respectively. 4 If the first line of the text file specifies the column names, select the following option: Use first line as column name indicator 5 If the second line of the text file specifies the column data types, select the following option: Use second line as data type indicator 4 Choose Finish. The new flat file data source appears under Data Sources in Data Explorer. Specifying what data to retrieve from a text file “Text file structure,” earlier in this chapter, describes the structure of data in a text file and provides an example. In the example, the structure of column names followed by rows of values resembles values stored in the table structure required by BIRT reports. This resemblance in data structure makes it easy to select the data to retrieve from a text file. The data set wizard displays the column names from the text file, and you select the columns that contain the data you want. 134 Provisioning Data How to create a data set to retrieve data from a text file This procedure assumes you have already created the flat file data source that this data set uses. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the flat file data source to use. Data Set Type displays Flat File Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In Select Columns, in File filter, select the file-name extension of the text file. 4 In Select file, select a text file from the drop-down list. The left pane displays the columns that are available in the selected file. 5 Select the columns to retrieve, and move them to the right pane. You can select columns in either of the following ways: ■ Select a column, then choose the arrow button. ■ Press Shift while you click to select multiple columns, then choose the arrow button. Figure 7-7 shows an example of all columns selected from a text file. Figure 7-7 Selecting columns from a text file 6 Choose Finish to save the data set. Edit Data Set displays the columns you selected, and provides options for editing the data set. Chapter 7, Accessing data in files 135 Accessing data in an XML file XML is an open specification recommended by the World Wide Web Consortium, and has become a common mechanism for sharing structured data across different information systems. The XML document that a report accesses must be well-formed. To be well-formed, it must conform to the XML 1.0, third edition specification. You can find more information about this specification at the following URL: http://www.w3.org/TR/REC-xml/ Creating an XML data source When creating an XML data source in BIRT, you specify the location of the XML file. You can also specify the location of an XML schema. An XML schema contains a set of rules to which an XML file conforms. You can use the schema to validate the XML file. How to specify the connection information for accessing an XML file 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, supply the following information: 1 Select XML Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. 3 Choose Next. New XML Data Source Profile appears, as shown in Figure 7-8. Figure 7-8 Defining XML source and schema information 3 Specify the following connection information: 136 Provisioning Data 1 In the first field, type the location of the XML file, or choose Browse to navigate to and select the file. 2 In the second field, type the location of the schema file, if one is available, or choose Browse to navigate to and select the file. A schema is not required. 3 Select the type of encoding for the XML file and schema. Use Auto to specify that the data source detect the encoding type specified in the XML file or schema. 4 Choose Finish. The new XML data source appears under Data Sources in Data Explorer. Specifying what data to retrieve from an XML data source BIRT reports must use data that is structured as a table consisting of rows and columns. XML documents use elements and attributes to present data. The data set wizard enables you to map a top-level XML element as a data set row, and other XML elements or attributes as columns. The wizard uses XPath expressions to define the paths to elements and attributes. XPath is a query language used to access parts of an XML document. When you select an element or attribute to map to a row or a column, the wizard generates the corresponding XPath expression. If you are familiar with XPath syntax and you want to do more than the basic mapping, you can write your own XPath expressions. This section describes the most common ways to write an XPath expression to use an element or attribute as a row or column in an XML data set. This section is not a substitute for formal XPath user documentation. Examples in this section refer to the following sample XML document: The Spanish Cook Book 2005 Everyone is Super Special 2005 Chapter 7, Accessing data in files 137 The most important syntax rules to consider when writing XPath expressions to define rows and columns are as follows: ■ Any path that starts with a forward slash (/) is an absolute path to an element. ■ Any path that starts with two forward slashes (//) selects an element at any location. ■ The XPath expression that defines the row mapping can use either of the previous path syntaxes. The following XPath expression selects all book elements that are children of library: /library/book The following XPath expression selects all book elements at any location: //book ■ For attribute paths, use a single forward slash or left and right brackets. For example, the following paths are equivalent: title/@lang title[@lang] ■ To define an element’s attribute as a column, use either of the following syntax forms: author/@name author[@name] ■ To define a table-level attribute as a column, use either of the following syntax forms: /@category [@category] ■ To filter data rows, use either of the following predicate expression syntaxes: ■ Single-position predicates in the abbreviated form. The following example selects the first author listed: author[1] 138 Provisioning Data ■ Single-equality conditions based on an attribute value. For example, select an element by using the value of an attribute of the element. In the following example, only books that are in English are selected: title[@lang='eng'] ■ XPath functions are not supported. How to create a data set to retrieve data from an XML document This procedure assumes you have already created the XML data source that this data set uses. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the XML data source to use. Data Set Type displays XML Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New XML Data Set, specify the XML source in one of the following ways: ■ To use the file specified in the XML data source, select Use the XML file defined in data source. ■ To select a file that is not specified in the data source, select the second option. In the text box, type the path to the XML file, or choose Browse to navigate to and select an XML file. Choose Next. 4 Define the row mapping, using the following steps: 1 In XML Structure, navigate to the XML element that represents a data set row, and select the element. Choose the right arrow. The Select or edit the XPath expression dialog displays an XPath expression that corresponds to the element you selected, as shown in Figure 7-9. 2 Select one of the generated XPath expressions or type another expression. For example, you can type a filter expression to retrieve only rows that meet a certain condition, as shown in the following example: /library/book/author[@country='us'] Choose OK, then choose Next. Chapter 7, Accessing data in files 139 Figure 7-9 Creating an XML data set 5 Define the column mapping, using the following steps: 1 In XML structure, navigate to and select the XML element or attribute that represents a column, then choose the right arrow. Column Mapping displays the default column mapping properties for the element or attribute you selected. Figure 7-10 shows an example of the default column mapping for an attribute named category. Figure 7-10 Column mapping properties 2 Select one of the generated XPath expressions or type another expression. Choose OK. 6 Repeat the preceding steps for every column to add to the data set. Figure 7-11 shows an example of column mappings defined in a data set. 140 Provisioning Data Figure 7-11 Mapping columns for an XML data set 7 Choose Finish to save the XML data set. Edit Data Set displays the columns, and provides options for editing the data set. Accessing data in a BIRT report document A report document is a binary file that contains report design information and cached data. It has the file extension .rptdocument. You can generate a report document from a report design, then use the report document as a data source for other reports. Using data from a report document provides the following benefits: ■ Faster report-generation time because the report does not have to connect to an external data source, such as a database or web service, to retrieve data. ■ Reuse of calculated data. If a table or chart in a report document contains calculated data, such as aggregations, that data is available to your report design. ■ Simplifies report creation. You do not need to gather all the information to connect to an external data source, nor do you need to create the query to retrieve data from the data source. Eliminating these steps is particular useful if data retrieval requires a complex SQL query to get data from a database, or a complex SOAP request to get data from a web service. A report document provides efficient access to data, but at the cost of data possibly being out of date. It is most useful in cases where connecting to and querying a data source is resource-intensive, or when the data changes infrequently. Chapter 7, Accessing data in files 141 As with other types of data sources, for a report to use data from a report document, you must create the following BIRT objects: ■ A data source that contains the information to connect to a report document ■ A data set that specifies the data to use from the report document Creating a report document Generate a report document from a report design by choosing Run➛Generate Document. Specify the folder in which to save the report document. To share the report document with other report developers, put the file in a shared resource folder. When used as a data source, the report document provides access to its result sets. At report generation, a result set is created for each table, chart, cross tab, and list. The data in a result set is defined by the data column bindings created for a table, chart, cross tab, or list. A report design has access to all result sets in a report document, except for a cross tab’s result set. For example, Figure 7-12 shows a report that displays sales data in two tables and a chart. Table Chart Table Figure 7-12 A report displaying data in two tables and a chart A report design that uses a report document generated from this report has access to the result sets generated for the two tables and the chart. When you create a 142 Provisioning Data data set to specify which result set data to use, the data set editor displays all the available results sets, as shown in Figure 7-13. Figure 7-13 Data set editor displaying available result sets with default names The result sets are organized under three categories: Bookmark, Element ID, and Result Set ID. The result sets named __bookmark_1, __bookmark_2, and __bookmark_3 correspond to the result sets for the first table, the chart, and the second table, respectively. Similarly, the result sets named Table_252, Chart_312, and Table_6 correspond to the result sets for the first table, the chart, and the second table. Finally, the result sets named QuRs0, QuRs1, and QuRs2 also correspond to the result sets for the first table, the chart, and the second table. While it seems redundant to provide three methods to select any given result set, each method offers different benefits. ■ For result sets that appear under Bookmark, the report developer designing the report from which a report document is generated can specify descriptive names. The name can be a literal string or an expression. ■ For results sets that appear under Element ID, the report developer can also specify descriptive names, but can only use literal strings to do so. Result sets are also organized hierarchically if a report, such as a master-detail report, contains nested tables. ■ For result sets that appear under Result Set ID, the report developer cannot specify alternate names. These generated names are useful for accessing a result set programmatically. Figure 7-14 shows another example of the data set wizard displaying the result sets available to a report design. Chapter 7, Accessing data in files 143 Figure 7-14 Data set editor displaying two result sets with custom names This time, the result sets under Bookmark use custom names, rather than the default __Bookmark_#. Result sets under Element ID are organized hierarchically under Orders and Payments, which are custom names. This hierarchy reflects the structure of data in the report document. Specifying bookmark names As Figure 7-13 showed, BIRT generates a bookmark for each result set, and assigns bookmark names using the __bookmark_# format. If you are designing the report from which to generate a report document, you can specify more descriptive bookmark names to better identify the data in result sets. For example, instead of using the default name, __bookmark_1, to refer to the result set for the sales summary table, specify a name, such as Sales Totals by State. How to specify a bookmark name 1 Open the report design to be used to generate a report document. 2 In the report layout, select the table, chart, or list for which to specify a bookmark. 3 In Properties Editor—Properties, choose Bookmark. 4 In Bookmark, specify one of the following: 144 Provisioning Data ■ A name, such as "Sales Totals by State". If typing a name, enclose it within double quotation marks, as shown in Figure 7-15. Figure 7-15 ■ Specifying a bookmark name for an element An expression, such as the following: row["COUNTRY"] "Order: " + row["ORDERNUMBER"] Specify an expression if, for example, a table is nested within another table or a list. In this case, the generated report document typically contains multiple instances of the inner table, and each table instance has a result set. If each table instance provides data about a particular sales order, for example, it is useful to identify each result set by order number, as shown in Figure 7-14. 5 Save the report. Specifying element names BIRT also assigns an element ID for each result set. As Figure 7-13 showed, each ID consists of the element type, an underscore, and a number, for example, Table_252 or Chart_312. If you are designing the report from which to generate a report document, you can specify more descriptive element IDs to describe the data in result sets. How to specify an element name 1 Open the report design to be used to generate a report document. 2 In the report layout, select the table, chart, or list for which to specify a name. 3 In Properties Editor—Properties, choose General. Notice that the Name property is undefined. Notice, too, that the Element ID property has a read-only number, which BIRT automatically assigns to every report element. BIRT uses this number in the default result set name. 4 In Name, type a descriptive and unique name. A report cannot contain duplicate element names. Figure 7-16 shows an example of a name, Sales Chart, specified for a chart. Chapter 7, Accessing data in files 145 Figure 7-16 Specifying a name for a chart element 5 Save the report. Connecting to a report document When creating a data source to connect to a report document, the only information required is the location and name of the report document. How to connect to a report document 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select BIRT Report Document Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. Choose Next. 3 In New BIRT Report Document Data Source Profile, use one of the following steps to provide a value for Report Document Path: ■ To select a report document in the resource folder, choose Browse. Select the report document, then choose OK. ■ To select a report document in a location other than the resource folder, click the arrow next to Browse, and choose Absolute Path. Navigate to the folder where the report document is stored, select the report document, then choose OK. Figure 7-17 shows an example of a report document data source that connects to a file named SalesReport.rptdocument in the resource folder. 4 Choose Finish. The data source appears under Data Sources in Data Explorer. The report now has access to the data saved in the report document. 146 Provisioning Data Figure 7-17 Report document selected Specifying the data to retrieve from a report document Because a report document contains cached data in the form of result sets, all you do is select the result set or result sets whose data to use in a report design. You must create a data set for each result set. How to retrieve data from a report document 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the report document data source to use. Data Set Type displays BIRT Report Document Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. New BIRT Report Document Data Set displays all the result sets in the report document. These items are organized under Bookmark, Element ID, and Result Set ID. 3 Expand a category, then select a result set to see its data columns, as shown in Figure 7-18. 4 Select the result set that contains the data to use in the report, then choose Finish. Edit Data Set displays the columns in the data set, as shown in Figure 7-19. 5 Choose Preview Results to view the data rows that the data set returns. 6 Choose OK to close the data set editor. Chapter 7, Accessing data in files 147 Figure 7-18 Selecting a result set Figure 7-19 Columns in a data set Accessing data in an e.report The ActuateOne for e.Reports Data Connector is an open data access (ODA) driver that enables BIRT to retrieve data from an e.report, a report designed and 148 Provisioning Data generated using Actuate e.Report Designer Professional. The e.report is a report object instance (.roi) file. Using the ActuateOne for e.Reports Data Connector driver to access the data and business logic from the existing e.report's data schema preserves the business logic in the report while BIRT consumes the data. About ActuateOne for e.Reports Data Connector functionality A BIRT data object, information object, or report design can use the data connector to retrieve data from an existing ROI file in a volume on Actuate BIRT iServer Release 11 or higher. The connector does not support an ROI file in a volume on Actuate BIRT iHub. The ROI file does not need to be on the same volume as the BIRT report design. The report object design (.rod) file used to generate the ROI file must have searchable fields defined. The ActuateOne for e.Reports Data Connector uses the display names for the searchable fields to name the columns in the BIRT data set. An ActuateOne for e.Reports data source connects to a volume. An ActuateOne for e.Reports data set maps controls in a selected e.report to data set columns. The user interface for creating the data set displays the control names as available columns. The names of the columns that the e.report data source uses are not available. Accessing an e.report using Page Level Security The ActuateOne for e.Reports Data Connector respects the Page Level Security (PLS) privileges implemented in the source e.report. The BIRT report developer must specify user credentials for an ROI file in order to retrieve data from that ROI file. PLS privileges of the specified user restrict the data that the BIRT report design retrieves from the ROI file. Accessing an e.report having multiple sections A BIRT report design can retrieve data from a complex, multi-section e.report. Each data set, however, can access data from a single section only. For example, if an e.report contains parallel sections to display an orders report and a payments report side by side, a data set can access data either from the orders report or the payments report, but not from both. A data set is also limited to accessing data from either data controls or page controls, but not from both. Typically, a BIRT report uses data from data controls. To determine which controls are available for use in the same data set, open the ROD file in e.Report Designer Professional to view the report structure. If the ROD file is not available, use the Output Columns page in Edit Data Set in BIRT Designer Professional to examine the partially scoped name of each column. The Chapter 7, Accessing data in files 149 name’s scope appears before the scope resolution operator (::), as shown in the following example: SalesRepTitleFrame::SalesRepTotal Connecting to an e.report When creating an ActuateOne for e.Reports data source in a BIRT report to connect to an e.report, you specify the iServer volume that contains the ROI files. You specify iServer connection properties and the name of the iServer volume. Specify the use of a trusted connection to improve performance. As you edit the data source and the data set, a trusted connection uses the same session to communicate with the iServer. A non-trusted connection uses the specified credentials to log in to the iServer for each communication. How to create an ActuateOne for e.Reports data source 1 In Data Explorer, right-click Data Sources and choose New Data Source. 2 In New Data Source, select ActuateOne for e.Reports Data Source. Choose Next. 3 In New ActuateOne for e.Reports Data Source Profile, type the URL of the iServer, the user credentials, the volume name, and whether the connection is trusted, as shown in Figure 7-20. Figure 7-20 Creating a data source profile 4 Choose Test Connection to ensure that the connection information is correct. If Test Connection returns an error, repeat the preceding steps to correct the error. Choose OK if the connection is successful. 5 Choose Finish. 150 Provisioning Data Specifying the data to retrieve from an e.report The controls in an e.report provide the columns for a data set. As discussed earlier, you must use columns from controls in only a single section in the e.report, and you must use either data controls or page controls. The scoped name of the column shows the section that contains the control. This name is available in the user interface for editing the data set. How to create an ActuateOne for e.Reports data set 1 In Data Explorer, right-click Data Sets and choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the ActuateOne for e.Reports data source to use. Data Set Type displays ActuateOne for e.Reports Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In Select Columns, choose Browse. 4 Navigate to the location of the ROI file and select the ROI file, as shown in Figure 7-21. Choose OK. Figure 7-21 Selecting the ROI file 5 In Select Columns, choose Refresh Columns. Available Columns displays the column names from the ROI file, as shown in Figure 7-22. 6 Select the columns to include in the report. Choose the right arrow to move the columns to the Selected Columns pane, as shown in Figure 7-23. Chapter 7, Accessing data in files 151 152 Figure 7-22 Available columns in Detail.roi Figure 7-23 Selecting columns Provisioning Data Choose Finish. 7 In Edit Data Set, select Preview Results. Figure 7-24 shows the data rows returned by the data set. Figure 7-24 Preview of data set results If the error message shown in Figure 7-25 appears, you selected columns from different sections in the e.report. Figure 7-25 Error message displayed when the selected columns are from different sections If the error message shown in Figure 7-26 appears, you selected columns from both data and page controls. Figure 7-26 Error message displayed when selected columns are from data and page controls Chapter 7, Accessing data in files 153 8 To resolve these errors, perform the following steps: 1 In the error message, choose OK. 2 Choose Output Columns. 3 Expand Name to make the full names of the output columns visible, as shown in Figure 7-27. Figure 7-27 Checking scope prefixes Note each name’s prefix. In the example shown in Figure 7-27, the first three columns are in the SalesRepTitleFrame component, and the fourth is in the Page component. This information indicates that the first three columns are from data controls, and the fourth column is from a page control. 4 Choose Select Columns. In Selected Columns, remove the incompatible columns. 5 Choose Preview Results to verify that the data set returns data rows. 154 Provisioning Data Chapter 8 Chapter 8 Accessing data from a web service This chapter contains the following topics: ■ Using web services data in a report ■ Connecting to a web service ■ Specifying what data to retrieve from a web service ■ Creating a web services data source using a custom connection class Chapter 8, Accessing data from a web service 155 Using web services data in a report Web services make software functionality available over the internet, using open and standard protocols, in the following manner: ■ A web service defines a structured format for requests for its service and for the response the service generates. ■ An application—a BIRT report, for example—makes a request for the web service over the internet. ■ The web service performs an action and sends the results to the application. The action can be calculating a monthly mortgage payment, retrieving stock quotes, converting currencies, getting the weather forecast for a particular city, or finding books by a particular author. If a report needs to present data that can only be generated by a program, you can save time and effort by looking for a web service that does the programming work and returns the data you need. Web services use the following standard protocols: ■ WSDL (Web Services Description Language) to describe the available services or operations provided by a web service ■ SOAP (Simple Object Access Protocol) to transfer data ■ XML to structure the data Because web services are written and maintained by other developers, always examine the web service definition to ensure that the web service does what you need. You should also verify that the WSDL document is well formed. Remember, too, that you have no control over web services created by others, and if a web service your report accesses is modified significantly or removed, the functionality no longer works in the report. Connecting to a web service When creating a web service data source in BIRT, you typically need to specify only the location of the WSDL file. A well-formed WSDL file defines the available services, and typically, the information to connect to the SOAP server identified by a SOAP endpoint URL. Alternatively, you can connect to the web service through a custom driver class. You would create and use a custom driver, if, for example, the web service does not provide a WSDL document. 156 Provisioning Data How to specify the connection information for accessing a web service This procedure shows how to connect to a public web service that returns the weather forecast for a specified U.S. zip code. You can get information about this web service at the following location: http://wiki.cdyne.com/?title=CDYNE_Weather 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, supply the following information: 1 Select Web Services Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. For this example, type: Weather_ws 3 Choose Next. 3 In New Web Services Data Source Profile, in WSDL URL or Location, type the following URL, as shown in Figure 8-1: http://wsf.cdyne.com/WeatherWS/Weather.asmx?WSDL For well-defined web services, you need only specify the URL to the WSDL document. Figure 8-1 URL to WSDL specified Chapter 8, Accessing data from a web service 157 4 Choose Test Connection to verify the connection to the web service. 5 Choose Finish. The new web service data source, Weather_ws, appears under Data Sources in Data Explorer. Specifying what data to retrieve from a web service A web service provides application functionality from a remote server, which makes the creation of a web service data set more complex than the other data set types. While the wizard guides you through the steps to supply the necessary information, it helps to understand generally how the data set communicates with a web service. The data set performs the following tasks: ■ It sends a SOAP request to the web service. The request specifies the action you want the web service to perform and the parameter values to use when performing the action. For example, the request can be to run a mortgage calculator, and the parameter values to send are the loan years, interest rate, loan amount, annual tax, and annual insurance. ■ The data set specifies the format of the SOAP response to use when the web service sends data back to the report. ■ Finally, the data set specifies the data that the web service should return. For example, in the case of a mortgage calculator, you can choose to get just the total monthly mortgage payment, or get additional data, such as monthly principal and interest, monthly tax, and monthly insurance. The data set wizard constructs the SOAP request and response based on the web service operation and schema you select. For well-defined web services, you can use the generated SOAP request and response without any modifications, so while knowledge of SOAP can be helpful, it is typically not required. The wizard also displays, in XML format, all the data that the operation can return, so all you do is select the data that you want to use in the report, then map the data to rows and columns. How to create a data set to retrieve data from a web service This procedure shows how to retrieve data from a public web service that returns the weather forecast for a specified U.S. zip code. This procedure uses the web service data source, Weather_ws, for which the creation procedure is provided in the previous topic. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the web services data source, Weather_ws. Data Set Type displays Web Services Data Set. 2 In Data Set Name, type a name for the data set, then choose Next. 158 Provisioning Data New Web Services Data Set displays the URL to the WSDL document, which describes the services or operations provided by the web service. You specified the WSDL URL when you created the data source. 3 Expand the WSDL URL. Expand WeatherSoap, then select GetCityForecastByZIP, as shown in Figure 8-2. Figure 8-2 Selecting the GetCityForecastByZIP operation from WeatherSoap Choose Next. New Web Services Data Set displays the parameter used by the GetCityForecastByZIP operation, as shown in Figure 8-3. The parameter, selected by default, defines the zip code for which to return the weather forecast. Chapter 8, Accessing data from a web service 159 Figure 8-3 Parameters and specified default values Choose Next. BIRT generates a SOAP request template based on the WSDL document and your parameter selection, as shown in Figure 8-4. The body section of the SOAP request tells the web service to run the GetCityForecastByZIP operation, using the value of the ZIP parameter. The parameter notation, &?ZIP?&, indicates that a value can be inserted in the SOAP request at run time. 160 Provisioning Data Figure 8-4 SOAP request template 4 Specify a value for the ZIP parameter, using the following steps: 1 Choose Edit Parameter. 2 In SOAP Request, select the ZIP parameter, then choose Edit. 3 In Edit parameter property, in Default Value, type a zip code, such as 94044. Choose OK. 5 Choose OK to save your changes to the SOAP request. Choose Next. New Web Service Data Set displays the options for constructing a SOAP response, as shown in Figure 8-5. Chapter 8, Accessing data from a web service 161 Figure 8-5 Options for constructing a SOAP response 6 Choose Next to accept the default options, which is the typical case for most well-defined web services. New Web Services Data Set displays an XML structure of the web service. 7 Select an element to map to a data set row, using the following steps: 1 In XML Structure, expand all the items, then select Forecast, as shown in Figure 8-6. Forecast defines the data root node of the data row returned by the weather forecast service. 162 Provisioning Data Figure 8-6 Select an element to map as a table 2 Click the > button. The Select or edit the XPath expression dialog, shown in Figure 8-7, prompts you to select a generated XPath expression, or to write a custom XPath expression. The expression defines the path to the selected XML element. Chapter 8, Accessing data from a web service 163 Figure 8-7 Options for specifying an XPath expression 3 Choose OK to accept the generated XPath expression selected by default. 4 Choose Next. 8 In Column Mapping, select the elements to map to columns. 1 In XML Structure, expand Forecast, then Temperatures. The GetCityForecastByZIP operation returns several categories of data. For this example, assume that you want to retrieve only the following data: Date, MorningLow, and DaytimeHigh contained in the Forecast node and City, which is at a higher level than Forecast. 2 Select the City element, which is at the same level as ForecastResult. Click the > button to create the XPath expression that maps the selected element as a column. In Column Mapping, accept the default data type of String, then choose OK. 3 Select the Date element and click the > button. In Column Mapping, change the data type to Date Time, then choose OK. 4 Select MorningLow and click the > button. In Column Mapping, change the data type to Integer, then choose OK. 5 Select DaytimeHigh and map it to an Integer column as for MorningLow. 9 Choose Finish. 10 In Edit Data Set, choose Preview Results. The data set returns the date, and the maximum and minimum temperatures for zip code 94044, as shown in Figure 8-8. 164 Provisioning Data Figure 8-8 Results returned by the web service data set Choose OK. When you use this data in a report, each time the report is run, the data set gets the weather forecast for the current day and for zip code 94044. While this data may be all you want to display in your report, the typical case when using a web service is to enable the report user to supply parameter values at run time. In the weather forecast example, you can make the report more interesting by prompting the user to specify the zip code for which to get weather information. For many web services, such as a mortgage calculator or a currency converter, incorporating web service data in the report makes sense only if users can specify parameter values to get the data they want. The solution is to create report parameters to prompt the user for values. The user-specified values are then passed to the data set, which, in turn, passes those values to the web service through the SOAP request. The procedure for implementing this solution is described next. The procedure describes how to create a basic report parameter. How to enable a user to provide parameter values to a web service This procedure continues with the weather forecast example. 1 Create a report parameter to get the zip code at run time, using the following steps: 1 In Data Explorer, right-click Report Parameters, then choose New Parameter. Chapter 8, Accessing data from a web service 165 2 In New Parameter, supply the following information: ❏ In Name, type a name for the report parameter. For example: ZIP_rp ❏ In Prompt text, specify a word or sentence to prompt the report user to provide a zip code. For example: ZIP code for weather forecast ❏ In Data type, select String. ❏ Use the default values for the other attributes. Figure 8-9 shows the completed report parameter definition. Figure 8-9 Report parameter to get the zip code 3 Choose OK. The ZIP_rp parameter appears under Report Parameters in Data Explorer. 2 Edit the data set parameter and link it to the report parameter. 166 Provisioning Data 1 In Data Explorer, right-click the web service data set, then choose Edit. 2 In Edit Data Set, select Parameters. As Figure 8-10 shows, the Parameters page displays the ZIP parameter defined in the GetCityForecastByZIP operation. The parameter’s default value is set to 94044, which is the value specified when you defined the data set. Figure 8-10 Data set parameter 3 Select the ZIP parameter, then choose Edit. 4 In Edit Parameter, in Linked To Report Parameter, select ZIP_rp, as shown in Figure 8-11. In the message that appears, choose Yes, then choose OK. Figure 8-11 Link the data set parameter to the report parameter 3 Choose OK to save your changes to the data set. 4 Test the web service functionality in a report. Chapter 8, Accessing data from a web service 167 1 Drag the web service data set from the Data Explorer and drop it in the layout editor. In Data Set Binding, choose Select All. Then, choose OK. BIRT creates a table and data elements to display the weather data. Format the table to put the city field in the heading row, as shown in Figure 8-12. Figure 8-12 Report design to display the weather forecast data 2 Choose Run➛View Report➛In Web Viewer. Enter Parameters displays the report parameter to get the zip code value. 3 Type a zip code, as shown in Figure 8-13. Choose OK. Figure 8-13 Zip code specified for the report parameter The report displays weather data for that zip code. Figure 8-14 shows an example. Figure 8-14 168 Provisioning Data Report displaying weather data for the specified zip code Creating a web services data source using a custom connection class You can use a custom connection or driver class to create a web services data source. The custom driver class is responsible for returning an Input Stream that contains a SOAP XML response. This class must implement a connect( ) method. This method has to return an Object, that implements an executeQuery( ) and an optional disconnect( ) method. The connect( ) method accepts two parameters, which contain the connection properties and application context. BIRT uses an application context map to store values and objects for use in all phases of report generation and presentation. You can reference objects in the application context from Script, the Expression Builder, in the ODA layer, and so forth. The application context map contains specific name-value pairs that are passed to all generation and rendering processes. You can use the application context to pass a security identifier that can be validated in the connection class and passed in as part of the SOAP request. In many cases the web services require such identifiers. The Custom Driver Class If you specify a Custom Driver Class when defining your Web Services Data Source, the run-time driver uses that class to create connections to the web service. The custom driver class is also responsible for executing the web services queries for web services data sets associated with this data source. The custom driver class you provide must be the fully-qualified name of a Java class. The class must implement the following method to establish a web services connection. Specify the location of the JAR containing the custom driver class in Driver Class Path. public static Object connect( java.util.Map connectionProperties, java.util.Map appContext ); The connectionProperties parameter specifies the run-time values of all public connection properties available to the driver as a (String, String) map keyed by the connection property name. The map may contain any or all of the following map keys: soapEndPoint, connectionTimeOut, connectionClass, OdaConnProfileName, and OdaConnProfileStorePath. The appContext parameter provides all the application context values as (String, Object) pairs. Its value is never null, but its collection may be empty. Chapter 8, Accessing data from a web service 169 The connection instance The connect(...) method of the custom driver class, after establishing a successful connection, returns a non-null object that implements the following two methods. public Object executeQuery( java.lang.String queryText, java.util.Map parameterValues, java.util.Map queryProperties ); public void disconnect( ); The executeQuery( ) method The queryText parameter specifies the query text. It can be null if the connection class does not require the report design to provide a query text. The parameterValues parameter specifies values of all the data set parameters as a (String, Object) map keyed by parameter name. It can be null if the data set does not define any parameters. The queryProperties parameter specifies values of all the data set public properties as a (String, String) map keyed by property name. The map can contain the queryTimeOut map key. The query method must return a value of either of the following data types: ■ java.lang.String The returned String is the complete SOAP response. ■ java.io.InputStream The returned stream is SOAP response stream. The disconnect( ) method This method closes the connection. The driver implementation of the disconnect method is optional. If it is implemented, BIRT calls this method when the associated data source closes. You can use it to close file streams and clean up other resources. Method calling and error handling The driver calls the custom driver class using Java reflection. The class and connection object do not need to implement any predefined interface. Any exception thrown by any of the defined methods is treated as an error and results in a failure and a thrown ODA Exception. 170 Provisioning Data Custom connection class example Listing 8-1 is an example of a custom class that accesses a set of connection properties, then instantiates a query object. In this example the connection properties and the application context are not used. Listing 8-1 Custom driver class import java.util.Iterator; import java.util.Map; public class MyConnectionClass { public static Object connect( Map connProperties, Map appContext ) { Iterator it = connProperties.keySet( ).iterator( ); while ( it.hasNext( ) ) { String key = it.next( ); } it = connProperties.values( ).iterator( ); while ( it.hasNext( ) ) { // Get value String value = it.next( ); } MyWSQuery msq = new MyWSQuery( ); return msq; } } Listing 8-2 is an example of a query class implementation that executes a query by opening a file input stream and disconnects, closing the file. Listing 8-2 Query class implementation import java.io.FileInputStream; import java.io.IOException; import java.util.Map; public class MyWSQuery { FileInputStream fis = null; public Object executeQuery( String queryText, Map parameterValues, Map queryProperties ) { try { fis = new FileInputStream( "c:/temp/ExchangeRates.xml" ); } catch ( IOException e ) { e.printStackTrace(); } return fis; Chapter 8, Accessing data from a web service 171 } public void disconnect( ) { if( fis != null ) { try { fis.close( ); } catch (IOException e) { e.printStackTrace(); } fis = null; } } } Setting up a report that uses a custom driver class Once you have defined the custom driver and query classes for the Data Source, use Test Connection to confirm that the class loader can find your custom classes. A completed custom web services data source is shown in Figure 8-15. Figure 8-15 Defining a web services data source with a custom driver Note that a WSDL descriptor is provided as a placeholder even when it is not used by your custom driver class. Next, define a data set from this custom data source. The New Web Services Data Set wizard helps you define the access to your Web Services data. 172 Provisioning Data How to define a Web Services data set with a custom driver 1 In your report design, open the Data Explorer. 2 Right-click Data Sets and choose New Data Set. Provide a name for your data set and choose Next. 3 In New Web Services Data Set, perform the following actions: 1 In WSDL Operation, select an operation as shown in Figure 8-16 and choose Next. Figure 8-16 Bypassing WSDL Operation 2 In SOAP Parameters, choose Next. 3 In SOAP Request, choose Next. 4 In SOAP Response, in Select SOAP Response Schema, select Use schema from response. 5 In Select Sample SOAP Response message, select Use external sample data file and specify a file matching the XML you are accessing with your custom driver class.The complete SOAP Response definition is shown in Figure 8-17. Choose Next. Chapter 8, Accessing data from a web service 173 Figure 8-17 Defining SOAP Response 6 In Row Mapping, choose an area of the XML file that contains repeating nodes. Choose the arrow to populate the XPath expression. 7 In Select or edit the XPath Expression, choose XML elements name “” at any location. Choose OK. Choose Next. 8 In Column Mapping, select all of the elements that you want to appear in your report. Choose the arrow to populate the column mappings, as shown in Figure 8-18. Figure 8-18 174 Provisioning Data Defining Column Mapping 9 Choose Show Sample Data to preview the data from the XML source. Choose Finish. Test your report by dragging the new data set into the report design to create a table, then choose Run➛View Report➛In Web Viewer. Chapter 8, Accessing data from a web service 175 176 Provisioning Data Chapter 9 Chapter 9 Accessing data in a POJO This chapter contains the following topics: ■ Using POJO data in a report ■ Connecting to a POJO ■ Specifying the data to retrieve from a POJO Chapter 9, Accessing data in a POJO 177 Using POJO data in a report Plain Old Java Objects (POJOs) are simple Java objects that do not implement framework-specific interfaces such as those defined by the EJB framework. Java developers use POJOs to separate an application’s business logic from infrastructure frameworks, which constantly evolve. POJOs are frequently used for data persistence—the storage and retrieval of data—in Java applications. BIRT Designer supports the use of POJOs as a data source for reports. As with other types of data sources, such as databases, XML files, and web services, for a report to use data from a POJO, you must create the following BIRT objects: ■ A POJO data source that contains the information to connect to a POJO ■ A POJO data set that defines the data that is available to a report No programming is required to create these BIRT objects. However, if using POJOs created by another developer, a basic understanding of what the classes do and the data they provide is necessary. A simple POJO example typically consists of the following classes: ■ A class that describes the data object, for example, a books class that describes the properties of books, including book title, author, publisher, year published, and so on. ■ A class that specifies how to retrieve data. For example, such a class can retrieve data about each book by using the Java interface, Iterator, and implementing the open( ), next( ), and close( ) methods to iterate through all the book objects. Connecting to a POJO When creating a POJO data source in a BIRT report to connect to a POJO, you specify the location of the JAR file that contains the POJO classes. You can specify either a relative or absolute path. How to create a POJO data source 1 In Data Explorer, right-click Data Sources, then choose New Data Source. 2 In New Data Source, specify the following information: 1 Select POJO Data Source from the list of data source types. 2 In Data Source Name, type a name for the data source. Choose Next. 3 In New POJO Data Source Profile, shown in Figure 9-1, specify the properties to connect to the POJO. 178 Provisioning Data Figure 9-1 POJO data source properties 1 In Runtime Properties, specify the location of the POJO classes that define the run time properties of the data source. Click the arrow icon next to Add Jars, then select either Relative path or Absolute path. ❏ Select Relative path to specify a path that is relative to the folder designated as the resource folder. By default, BIRT uses the current project folder as the resource folder. In Select Jars/Zips, which displays the contents of the resource folder, as shown in Figure 9-2, select the JAR file, then choose OK. Figure 9-2 ❏ Select Jars/Zips displaying the contents of the resource folder Select Absolute path to specify the full path in the file system. Browse the file system and select the JAR file, then choose OK. Chapter 9, Accessing data in a POJO 179 2 In Design Time Properties, specify the location of the POJO classes that define the design time properties of the data source. The data set editor uses this information to list the get methods, which you select to define the column mappings for the POJO data set. Use the instructions in the previous step to specify either a relative path or absolute path to the POJO classes. 4 Choose Test Connection to ensure that the connection information is correct. If Test Connection returns an error, repeat the preceding steps to correct the error. 5 Choose Finish. The new POJO data source appears under Data Sources in Data Explorer. Specifying the data to retrieve from a POJO BIRT reports must use data that is structured as a table consisting of rows and columns. For a POJO data set to return data in this format, you map methods or members of a POJO class to columns. Listing 9-1 shows an example of a class that represents music CDs. The class describes the members and uses pairs of get and set methods to persist the data. To create a data set using this class, you would map the get methods to columns. Listing 9-1 Class representing music CDs package dataset; public class CD { private String private String private String private String private String private String cdTitle; cdArtist; cdCountry; cdCompany; cdPrice; cdYear; public CD(String title) { this.cdTitle = title; } public String getCDTitle() { return cdTitle; } public void setCDTitle(String title) { this.cdTitle = title; } public String getCDArtist() { return cdArtist; } 180 Provisioning Data public void setCDArtist(String artist) { this.cdArtist = artist; } public String getCDCountry() { return cdCountry; } public void setCDCountry(String country) { this.cdCountry = country; } public String getCDCompany() { return cdCompany; } public void setCDCompany(String company) { this.cdCompany = company; } public String getCDPrice() { return cdPrice; } public void setCDPrice(String price) { this.cdPrice = price; } public String getCDYear() { return cdYear; } public void setCDYear(String year) { this.cdYear = year; } } How to create a POJO data set This procedure assumes you have already created the POJO data source that this data set uses. Examples in this section refer to the POJO example in Listing 9-1. 1 In Data Explorer, right-click Data Sets, then choose New Data Set. 2 In New Data Set, specify the following information: 1 In Data Source Selection, select the POJO data source to use. Data Set Type displays POJO Data Set. 2 In Data Set Name, type a name for the data set. 3 Choose Next. 3 In New POJO Data Set, specify the following information: 1 In POJO Data Set Class Name, specify the POJO class that retrieves the data at run time. Choose Browse to find and select the class. Chapter 9, Accessing data in a POJO 181 2 In Application Context Key, use the default key or delete it. This property is optional. Figure 9-3 shows an example of properties set for a POJO data set. Figure 9-3 POJO data set properties 4 Choose Next. 5 Map methods or fields in a POJO class to data set columns, using the following steps: 1 In POJO Class Name, specify the POJO class that contains the get methods to map to columns. You can choose Browse to find and select the class. The data set editor uses a get* filter to display all the get methods in the specified POJO class, as shown in Figure 9-4. Figure 9-4 182 Provisioning Data Data set editor displaying the get methods in a POJO class 2 Double-click the get method to map to a data set column. Add Column Mapping displays the mapping information, as shown in Figure 9-5. Figure 9-5 Column mapping information Choose OK to accept the default values. 3 Repeat the previous step for every column to add to the data set. Figure 9-6 shows an example of column mappings defined in a POJO data set. Figure 9-6 Data set editor displaying the column mappings 6 Choose Finish to save the data set. Edit Data Set displays the columns, and provides options for editing the data set. 7 Choose Preview Results to view the data rows returned by the data set. Figure 9-7 shows an example of data rows returned by a POJO data set. Chapter 9, Accessing data in a POJO 183 Figure 9-7 184 Provisioning Data Data rows returned by a POJO data set Part Three 3 Modeling data Part 3 ■ Creating data objects ■ Providing data security Chapter 10 Creating data objects Chapter 10 This chapter contains the following topics: ■ About data objects ■ Design considerations ■ Building a data object ■ Building a data model ■ Providing cached data ■ Publishing a data object ■ Enabling incremental updates ■ Maintaining a data object Chapter 10, Creating data objects 187 About data objects A data object is a BIRT object that contains all the information necessary to connect to an external data source, retrieve data from that data source, and structure the data in a way that supports business analysis. Data objects are similar to data marts, which are simplified repositories of data gathered from corporate data sources and designed to address specific business queries. Data architects or report developers create data objects to provide data for the following items: ■ Dashboards, which users create using a dashboard application on BIRT iHub ■ BIRT reports that are created using either BIRT Designer Professional or Report Studio on iHub Data objects use Actuate’s in-memory analytics technology, which loads data in memory to speed up processing. Design considerations A data object is a collection of the following BIRT objects: ■ Data sources ■ Data sets ■ Data models ■ Data cubes ■ Report parameters A data object can include any number of data sources, data sets, data models, data cubes, and report parameters. Although it is possible to create a single data object that contains all the data that dashboard users or report developers could possibly need, a data object that provides too much data can be confusing for users. In addition, the amount of memory that a data object uses increases with the number of data rows returned by data sets and the number of aggregations calculated by cubes. If creating data objects for diverse groups of users or reports, evaluate how best to organize data into data objects and how much data to include in each data object. The objects to include in a data object depend on which item—dashboard or BIRT report—is using the data object. Table 10-1 lists the objects that you typically include when creating a data object for a dashboard and for a report. 188 Provisioning Data Table 10-1 Typical objects in a data object for a dashboard and a report Data source Data set Data model Cube Report parameter Dashboard ✓ ✓ ✓ ✓ ✓ BIRT report ✓ ✓ ✓ ✓ The following topics describe in more detail the guidelines for designing data objects for dashboards and reports. Designing data objects for dashboards A dashboard is a web application designed for business users who want to measure the effectiveness of their business processes and have this decision-support information displayed graphically. The data objects you create for dashboards need to extract the right data and provide it in a structure suitable for dashboard gadgets. Use the following guidelines when designing data objects for dashboard users: ■ Identify the users and create a data object or series of data objects for each user group. A typical approach is to create data objects by groups of users, where each data object fulfills a different business need. For example, executives, sales managers, and customer support managers represent three distinct user groups with different data requirements. Executives might be interested in viewing revenue by month or quarter. Sales managers might need to evaluate the sales numbers of individual sales representatives by month or quarter. Customer support managers might need to monitor support call volume by days. Depending on how iHub user accounts are set up, you might be able to leverage the defined user groups to identify the user groups by which to organize data objects. Contact the iHub administrator for this information. ■ Provide users with sufficient data that they can use to analyze by different dimensions and at different levels of detail. Users often need to view data from different dimensions. For example, sales managers might need to view sales by product line, by region, or by sales representative, and by different time periods. If viewing sales data by region, sales managers might need to drill down to view sales by cities within each region. ■ Design data sets, data models, and cubes to provide data that is suitable for the gadgets that will be used to display data. The dashboard provides a suite of gadgets for displaying data. Each gadget accesses data in the same way as the corresponding element does in a BIRT report. ■ All chart gadgets use data from a data set, a data model, or a cube. Chapter 10, Creating data objects 189 ■ ■ The cross tab gadget uses data from a data model or cube. ■ The table and summary table gadgets use data from a data set or a data model. In addition, summary tables require that each column in the data set be assigned the appropriate analysis type to provide the expected functionality. For more information, see “Configuring data set columns for summary tables,” later in this chapter. Design data sets or report parameters to provide lists of values to display in data selector gadgets. Just as a report parameter supports run-time filtering of data in a report, a data selector gadget enables a dashboard user to filter data in a chart, cross tab, table, or any other gadget that displays data. Figure 10-1 shows a dashboard that uses five gadgets to display sales data. Descriptions of each gadget follow the illustration. Data selector Figure 10-1 190 Column chart Linear gauge Line chart Cross tab Sample dashboard displaying five gadgets ■ The data selector gadget displays a list of countries. The data selector is linked to the column chart next to it. Dashboard users select values from the data selector to filter the data to display in the column chart. The data selector gets its values from a data set, a data model, a cube, or a report parameter. ■ The column chart gadget is linked to the data selector, as described previously. The column chart derives its data from a data set, a data model, or a cube. ■ The line chart gadget derives its data from a data set, a data model, or a cube. ■ The linear gauge gadget derives its data from a data set, a data model, or a cube. ■ The cross tab gadget derives its data from a data model or a cube. Provisioning Data Designing data objects for reports created with Report Studio Report Studio is a web-based report design tool on iHub designed for users who want to create reports quickly and easily without deep understanding of database architecture or report design techniques. Report Studio users create reports using predefined data sources and templates that provide the data and basic layout for their reports. The data objects you create should provide the data in a structure that is appropriate for business users and for the report elements that users can add to a report. The design guidelines discussed in the previous topic apply here as well, except for the following: ■ In Report Studio, a chart uses data from a table in a report. The chart does not use data directly from a data set, a data model, or a cube. ■ Report parameters in a data object do not link to parameters created in Report Studio, so you typically do not include report parameters in a data object that you create for Report Studio users. Designing data objects for reports created with BIRT Designer Professional A data object can be used as a data source for BIRT reports. As an alternative to defining data sources and data sets for each report, you can create a data object that multiple reports share. This principle is similar to reports sharing data sources and data sets stored in a library. One design option is to create data objects by types of reports. For example, if creating a set of sales reports and a set of customer reports, create one data object to provide data for the first set of reports and another for the second set of reports. Report parameters in a data object cannot be reused in a report. A report parameter in a data object acts as a filter. You are prompted to specify a parameter value when you create a data source based on the data object, and the data source returns only the rows that meet the filter criteria. Building a data object Building a data object entails creating a data object file, then adding data sources, data sets, data models, and cubes to the data object. To add these data items, you can: ■ Create new data items within the data object. ■ Export data items in reports or libraries to the data object. Chapter 10, Creating data objects 191 How to create a data object 1 In the Report Design perspective, choose File➛New➛Data Object. 2 In New Data Object, do the following: 1 Select the folder in which to store the data object. 2 Edit the default file name to specify a new name. The extension must be .datadesign and the file name must not contain the following characters: [ ] * / \ : & ? Use a descriptive name that enables users to determine the contents of the data object. A descriptive name is particularly important if users have access to multiple data objects. 3 Choose Finish. A blank data object design appears. 3 Start adding data sources, data sets, data models, and cubes to the data object. Creating new items for a data object The procedures for creating data sources, data sets, cubes, and report parameters for a data object are the same as the procedures for creating these items for a report. Use Data Explorer to create, edit, and delete data items in a data object. Figure 10-2 shows a data object that contains one data source, two data sets, and two cubes. Figure 10-2 192 Provisioning Data Contents of a data object For information about building a data model, see “Building a data model,” later in this chapter. In this data object example, the Corporate_db data source connects to a corporate database. The ds_Revenue and ds_Expenses data sets retrieve data from the database. The Revenue and Expenses cubes use data from the ds_Revenue and ds_Expenses data sets, respectively. Exporting items to a data object An organization that creates and uses BIRT reports has data sources, data sets, and cubes available. As a data object designer, you can reuse these items by exporting them from a report or a library to a data object. The export utility copies the items to the data object. The exported items do not reference the original items in the report or the library. The export utility also detects and copies dependent items. For example, if you export a cube, the utility also exports the data source and data set that the cube uses. Be careful when exporting multiple data sources, data sets, or cubes from different sources. If the data object contains an item with the same name, BIRT warns of the name conflict and asks whether or not to overwrite the item in the data object. Overwrite the item only if you want to replace it. The overwrite action cannot be undone. If you select a data set or a cube to export, and BIRT displays the name-conflict message, the duplicate names apply not only to the selected data set or cube, but to any dependent item. For a cube, for example, the data set or data source used by the cube might have the same name as a data set or data source in the target data object. How to export data items to a data object 1 Open the report or library that contains the data items to export to a data object. 2 In Data Explorer, right-click a data item, then choose Export to Data Object, as shown in Figure 10-3. Figure 10-3 Exporting a data source to a data object Chapter 10, Creating data objects 193 Export Elements to Data Object—Select Data Object displays the data objects (.datadesign files) in the resource folder, if any exist, as shown in Figure 10-4. By default, BIRT uses the current project folder as the resource folder. Figure 10-4 Export Elements displaying the data objects in the resource folder 3 Perform one of the following steps: ■ Select an existing data object to which to export data items. ■ Create a new data object by specifying a file name in New File Name. BIRT saves the data object in the Data Objects subfolder of the resource folder. 4 Choose Next. 5 In Export Elements to Data Object—Select Elements, shown in Figure 10-5, select one or multiple data items to export. If you select a data set, BIRT also exports the data source that the data set uses. Similarly, if you select a cube, the associated data set and data source are exported. Figure 10-5 6 Choose Finish. 194 Provisioning Data Export Elements displaying the data items that you can export If BIRT does not detect any name conflicts between the items selected for export and the items in an existing data object, BIRT exports the items and asks if you want to open the data object. 7 Open the data object to review its contents. The exported data items appear in the data object. Creating a shared dimension for cubes If designing multiple cubes that contain the same dimension, create a shared dimension. For example, if one cube contains sales data by country, state, and city, and another cube contains budget data by country, state, and city, you can create a shared multi-level dimension that provides country, state, and city data. By using a shared dimension, you define and maintain dimension data in one place, and reuse the dimension in multiple cubes. Reusing a dimension also speeds up data processing. There are two ways to create a shared dimension. You can: ■ Create a new shared dimension. ■ Convert an existing cube dimension into a shared dimension. How to create a shared dimension This procedure assumes that you have already created a data object, as well as the data set that provides the data for the dimension. 1 Open the data object. 2 In Data Explorer, right-click Shared Dimensions, then choose New Shared Dimension. 3 In Dimension Builder, specify the following information: ■ In Dataset, select the data set that contains the columns to use in the dimension. ■ In Available Columns, drag a column and drop it in the following location: (Drop a field here to create a group) ■ In Add Group, type a name for the group. ■ If creating a multi-level dimension, drag and drop additional columns. Figure 10-6 shows an example of a multi-level dimension that contains country, state, and city data. Chapter 10, Creating data objects 195 Figure 10-6 Dimension Builder displaying a defined shared dimension Choose OK. The shared dimension appears under Shared Dimensions in Data Explorer and in the data object design. How to convert a cube dimension into a shared dimension This procedure assumes that you have already included cubes in a data object. 1 Open the data object. 2 In Data Explorer, expand the cube that contains the dimension to convert to a shared dimension, then right-click the dimension and choose Convert to Shared Dimension, as shown in Figure 10-7. Figure 10-7 Converting a dimension to a shared dimension The converted dimension appears under Shared Dimensions in Data Explorer and in the data object design, as shown in Figure 10-8. BIRT also replaces the original cube dimension with the shared dimension. 196 Provisioning Data Figure 10-8 Data Explorer and data object showing the shared dimension Configuring data set columns for summary tables Summary tables are commonly used in dashboards and Report Studio reports to display key summary information. Figure 10-9 shows an example of a summary table created in Report Studio. Order data is grouped by date, in quarterly intervals, and by product line. The price of each order is summed to display subtotals for each quarter and product line, as well as a grand total. Figure 10-9 Summary table Chapter 10, Creating data objects 197 To create a summary table quickly, users select a table’s auto-summarize feature, then select the data set columns whose data to group and aggregate. When the auto-summarize feature is enabled, the software performs the grouping and aggregating. To support this feature, you must configure each data set column to provide the software with the appropriate information to perform these tasks. For example, it makes sense to group sales data by order date or product line, but not by revenue. Conversely, it makes sense to aggregate revenue values, but not order date or product line values. To provide the appropriate information to generate a summary table, set the Analysis Type property of each data set column to one of the following values: ■ Dimension Use this analysis type to support the grouping of data in the column. For example, to display revenue by product line, set the product line column as a dimension. ■ Attribute An attribute describes the items associated with a dimension. For a product dimension, for example, attributes might include color, size, and price. When you set a column as an attribute, you must also specify the dimension column of which it is an attribute. The summary table cannot group data in an attribute column. ■ Measure Use this analysis type to support the aggregation of values in the column. For example, to calculate revenue totals, set the revenue column as a measure. If you do not set a column’s analysis type, BIRT Designer Professional assigns a value using the following criteria: ■ If the column contains numeric values, the analysis type is measure. ■ If the column contains string, date, or Boolean values, the analysis type is dimension. ■ If the column is a primary key, a foreign key, or an indexed column in a database, the analysis type is dimension even if the column contains numeric values. Sometimes, the default analysis type values do not provide sensible data for a summary table. To create a well-designed data object, it is necessary to review the analysis type for every column in the data set. The following are examples of problems with the default values: ■ 198 Not all numeric columns are suitable as measures. Sometimes, it makes sense to group on numeric values, such as order numbers or customer numbers. In these cases, you would change the column’s analysis type to dimension. Sometimes, it does not make sense to aggregate numeric data, such as manufacturer’s suggested retail price (MSRP), which might be more appropriate as an attribute of a product dimension. Provisioning Data ■ Not all string columns are suitable as dimensions. For example, if each value in a column, such as product code, is unique, it does not make sense to group on this column. Such a column is better defined as an attribute of a product dimension. How to set the analysis type of a data set column This procedure assumes that you have already created a data object and added a data set to it. 1 Open the data object. 2 In Data Explorer, double-click the data set to edit it. 3 In Edit Data Set, choose Output Columns, then double-click the column whose analysis type to set. Edit Output Column displays the properties of the selected column, as shown in Figure 10-10. Figure 10-10 Properties of a data set column 4 In Analysis Type, select Dimension, Attribute, or Measure. If you select Attribute, in the list box that displays