Transcript
vFabric SQLFire User's Guide VMware vFabric SQLFire 1.1 VMware vFabric Suite 5.3
This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.
EN-001171-00
You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/ The VMware Web site also provides the latest product updates. If you have comments about this documentation, submit your feedback to:
[email protected]
Copyright © 2013 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304
Contents Part I: About the SQLFire User's Guide....................................................1 Part II: Supported Configurations and System Requirements...............3 Part III: Getting Started with vFabric SQLFire..........................................7 vFabric SQLFire in 15 Minutes.....................................................................................................................8
Chapter 1: Overview of vFabric SQLFire................................................13 Data Flow Diagram.....................................................................................................................................14 GemFire, Apache Derby, and SQL Components........................................................................................14
Chapter 2: Understanding the SQLFire Distributed System.................17 vFabric SQLFire Members..........................................................................................................................17 Servers, Peer Clients, and Server Groups..................................................................................................18 Discovery Mechanisms...............................................................................................................................18 Group Membership Service........................................................................................................................18 Replicated Tables and Partitioned Tables...................................................................................................19 Parallel Execution of Data-Aware Stored Procedures.................................................................................19 Cache Plug-ins for External Data Connections..........................................................................................20
Chapter 3: Installing vFabric SQLFire.....................................................21 Installation Note for vFabric Suite Customers.............................................................................................21 RHEL Only: Install vFabric SQLFire from an RPM.....................................................................................21 Install vFabric SQLFire from a ZIP File.......................................................................................................25
Chapter 4: Activating vFabric SQLFire Licensing.................................27 Understand vFabric SQLFire License Options...........................................................................................27 Choose a License Option Based on Topology............................................................................................28 How vFabric SQLFire Manages Licensing..................................................................................................30 Install and Configure vFabric SQLFire Licenses.........................................................................................31 Verify Your License and Check Your License Usage...................................................................................33
Chapter 5: Upgrading vFabric SQLFire..................................................35 Before You Upgrade....................................................................................................................................35 RHEL: Upgrade vFabric SQLFire from RPM..............................................................................................36 Upgrade vFabric SQLFire from a ZIP File..................................................................................................37 Version Compatibility Rules........................................................................................................................39
Chapter 6: Connect to vFabric SQLFire with JDBC Tools.....................41 iii
Chapter 7: Tutorials..................................................................................43 Main Steps..................................................................................................................................................43 Create a SQLFire Cluster...........................................................................................................................43 Connect to the Cluster Using SQLF...........................................................................................................45 Create Replicated Tables and Execute Queries.........................................................................................46 Implement a Partitioning Strategy...............................................................................................................47 Persist Tables to Disk..................................................................................................................................49 Add Servers to the Cluster and Stop Servers.............................................................................................51 Perform Additional Tasks............................................................................................................................52
Chapter 8: vFabric SQLFire Features and Benefits...............................53 Part IV: Managing Your Data in vFabric SQLFire....................................55 Chapter 9: Designing vFabric SQLFire Databases................................57 Design Principles of Scalable, Partition-Aware Databases.........................................................................57 Identify Entity Groups and Partitioning Keys...............................................................................................57 Replicate Code Tables................................................................................................................................58 Dealing with Many-to-Many Relationships..................................................................................................58 Example: Adapting a Database Schema for SQLFire.................................................................................59
Chapter 10: Using Server Groups to Manage Data................................63 Server Groups Overview.............................................................................................................................63 Adding Members to Server Groups............................................................................................................65 Assigning Tables to Server Groups.............................................................................................................66
Chapter 11: Partitioning Tables...............................................................67 How Table Partitioning Works.....................................................................................................................67 Understanding Where Data Is Stored.........................................................................................................68 Failure and Redundancy.............................................................................................................................69 Creating Partitioned Tables.........................................................................................................................70 Rebalancing Partitioned Data on SQLFire Members..................................................................................75 Managing Replication Failures....................................................................................................................76
Chapter 12: Replicating Tables................................................................77 How SQLFire Replicates Tables.................................................................................................................77 Deciding When to Use Replicated Tables...................................................................................................77 Creating Replicated Tables.........................................................................................................................78
Chapter 13: Estimating Memory Requirements.....................................79 Estimating SQLFire Overhead....................................................................................................................79 Viewing Memory Usage in SYS.MEMORYANALYTICS..............................................................................80
Chapter 14: Using Disk Stores to Persist Data......................................85 iv
vFabric SQLFire User's Guide
Contents
Overview of Disk Stores..............................................................................................................................85 Guidelines for Designing Disk Stores.........................................................................................................91 Creating a Disk Store or Using the Default.................................................................................................92 Persist Table Data to a Disk Store...............................................................................................................92 Optimizing Availability and Performance.....................................................................................................93 Starting System with Disk Stores................................................................................................................93 Disk Store Management.............................................................................................................................96
Chapter 15: Exporting and Importing Data with vFabric SQLFire.....105 Using SQLF Commands to Export and Import Data................................................................................105 Using Apache DdlUtils to Import Data......................................................................................................107 Exporting and Importing Data from Text Files...........................................................................................110
Chapter 16: Using Table Functions to Import Data as a SQLFire Tables.113 Overview of SQLFire Table Functions.......................................................................................................113 Example Table Function............................................................................................................................115
Part V: Developing Applications with SQLFire.....................................117 Chapter 17: Starting SQLFire Servers with the FabricServer Interface.119 Starting a Network Server........................................................................................................................120
Chapter 18: Developing Java Clients and Peers..................................123 Connect to a SQLFire Server with the Thin Client JDBC Driver...............................................................123 Start a SQLFire Peer with the Peer Client JDBC Driver...........................................................................126
Chapter 19: Configuring SQLFire as a JDBC Datasource..................129 Using SQLFire with Hibernate..................................................................................................................118
Chapter 20: Storing and Loading JAR Files in SQLFire......................131 Class Loading Overview...........................................................................................................................131 Alternate Methods for Managing JAR Files..............................................................................................133
Chapter 21: Developing ADO.NET Client Applications.......................137 About the ADO.NET Driver.......................................................................................................................137 ADO.NET Driver Classes..........................................................................................................................138 Installing and Using the ADO.NET driver..................................................................................................139 Connecting to SQLFire with the ADO.NET Driver....................................................................................140 Managing Connections.............................................................................................................................140 Executing SQL Commands.......................................................................................................................141 Working with Result Sets..........................................................................................................................141 Storing a Table..........................................................................................................................................142 Storing Multiple Tables..............................................................................................................................143 Specifying Command Parameters with SQLFParameter..........................................................................144 Updating Row Data...................................................................................................................................145
v
Adding Rows to a Table............................................................................................................................146 Managing SQLFire Transactions...............................................................................................................147 Performing Batch Updates........................................................................................................................148 Generic Coding with the SQLFire ADO.NET Driver..................................................................................149
Chapter 22: Using SQLFire.NET Designer...........................................153 Installing SQLFire.NET Designer..............................................................................................................153 Connecting to a SQLFire Distributed System...........................................................................................153 Editing Tables............................................................................................................................................154
Chapter 23: Understanding the Data Consistency Model...................155 Data Consistency Concepts.....................................................................................................................155 No Ordering Guarantee for DML in Separate Threads.............................................................................156 Updates on Any Row Are Atomic and Isolated.........................................................................................156 Atomicity for Bulk Updates........................................................................................................................156
Chapter 24: Using Distributed Transactions in Your Applications.....157 Overview of SQLFire Distributed Transactions.........................................................................................157 Sequence of Events for a Distributed Transaction....................................................................................160 SQLFire Transaction Design.....................................................................................................................161 Best Practices for Using Transactions.......................................................................................................161 Transaction Functionality and Limitations.................................................................................................162
Chapter 25: Using Data-Aware Stored Procedures.............................163 Configuring a Procedure...........................................................................................................................163 Configuring a Custom Result Processor...................................................................................................164 Invoking a Procedure................................................................................................................................165 Example JDBC Client...............................................................................................................................166
Chapter 26: Using the Procedure Provider API....................................169 Procedure Parameters..............................................................................................................................169 Populating OUT and INOUT Parameters..................................................................................................170 Populating Result Sets..............................................................................................................................170 Using the
and Escape Syntax with Nested Queries......................................................172
Chapter 27: Using the Custom Result Processor API.........................173 Implementing the ProcedureResultProcessor Interface ..........................................................................173 Example Result Processor: MergeSort.....................................................................................................173
Chapter 28: Programming User-Defined Types....................................177 Chapter 29: Using Result Sets and Cursors........................................181 Non-updatable, Forward-Only Result Sets...............................................................................................181 Updatable Result Sets..............................................................................................................................182 Scrollable Insensitive Result Sets.............................................................................................................187
vi
vFabric SQLFire User's Guide
Contents
Result Sets and Autocommit.....................................................................................................................188
Part VI: Caching Data with vFabric SQLFire.........................................189 Chapter 30: SQLFire Cache Strategies.................................................191 Chapter 31: Using a RowLoader to Load Existing Data......................193 How SQLFire Invokes a RowLoader.........................................................................................................193 Implementing the RowLoader Interface....................................................................................................194 Using the JDBCRowLoader Example.......................................................................................................194
Chapter 32: Evicting Table Data from SQLFire.....................................197 How LRU Eviction Works..........................................................................................................................197 Limitations of Eviction...............................................................................................................................197 Eviction in Partitioned Tables....................................................................................................................198 Create a Table with Eviction Settings........................................................................................................198
Chapter 33: Handling DML Events Synchronously.............................201 Writer and Listener Cache Plug-ins..........................................................................................................201 Example Writer Implementation................................................................................................................202 Example Listener Implementation............................................................................................................202
Chapter 34: Handling DML Events Asynchronously..........................203 How the AsyncEventListener Works.........................................................................................................203 Implementation Requirements..................................................................................................................203 Implementing an AsyncEventListener.......................................................................................................204
Chapter 35: Using DBSynchronizer to Apply DML to an RDBMS......209 How DBSynchronizer Works.....................................................................................................................209 Restrictions and Limitations......................................................................................................................210 Configuring DBSynchronizer....................................................................................................................211
Chapter 36:Suppressing Event Callbacks for a vFabric SQLFire Connection.215 Part VII: Deploying vFabric SQLFire......................................................217 Chapter 37: SQLFire Deployment Models............................................219 Embedded Peer-to-Peer Deployment.......................................................................................................220 Client-Server Deployment.........................................................................................................................222 Multi-site Deployment...............................................................................................................................223
Chapter 38: Steps to Plan and Configure a Deployment....................227 Chapter 39: Configuring Discovery Mechanisms................................229 vii
Using Locators..........................................................................................................................................229 Configure Multicast Discovery..................................................................................................................231
Chapter 40: Starting and Configuring SQLFire Servers.....................233 Start and Stop SQLFire Servers Using sqlf..............................................................................................233 Specify the Server Working Directory.......................................................................................................234 Specify Client Connection Information......................................................................................................234 Define Server Groups...............................................................................................................................234 Execute SQL When You Start a Server.....................................................................................................235 Using Additional Boot Properties..............................................................................................................235
Chapter 41: Configuring Multi-site (WAN) Deployments.....................237 About Gateways........................................................................................................................................237 About High Availability for WAN Deployments..........................................................................................239 Limitations of Multi-Site Replication..........................................................................................................240 Prerequisites for WAN Replication............................................................................................................241 Steps to Configure a Multi-site Deployment..............................................................................................241
Chapter 42: Configuring Authentication and Authorization...............249 Configuring User Authentication...............................................................................................................249 User Names in Authentication and Authorization ....................................................................................257 Configuring User Authorization.................................................................................................................258 Configuring Network Encryption and Authentication with SSL/TLS..........................................................262
Part VIII: Managing and Monitoring vFabric SQLFire..........................267 Chapter 43: Configuring and Using SQLFire Log Files.......................269 Log Message Format................................................................................................................................269 Severity Levels..........................................................................................................................................269 Using java.util.logging.Logger for Application Log Messages...................................................................270 Using Trace Flags for Advanced Debugging.............................................................................................270
Chapter 44: Querying SQLFire System Tables and Indexes...............273 Getting Information About SQLFire Members..........................................................................................273 Getting Information About User Tables.....................................................................................................274
Chapter 45: Evaluating Query Plans and Query Statistics.................279 Capture a Query Plan for an Individual Statement...................................................................................279 Capture Query Plans for All Statements...................................................................................................280 Example Query Plan Analysis..................................................................................................................280 Query Plan Codes....................................................................................................................................284
Chapter 46: Overriding Optimizer Choices..........................................287 Chapter 47: Evaluating System and Application Performance..........291 viii
vFabric SQLFire User's Guide
Contents
Collecting System Statistics......................................................................................................................291 Collecting Application Statistics................................................................................................................292 Using VSD to Analyze Statistics...............................................................................................................292
Chapter 48: Using Java Management Extensions (JMX)....................303 Using a JMX Manager Node.....................................................................................................................303 Using a vFabric SQLFire JMX Agent........................................................................................................306
Chapter 49: Best Practices for Tuning Performance...........................319 Tune Application Logic..............................................................................................................................319 Reduce Distribution Overhead..................................................................................................................319 Reduce Overhead of Eviction to Disk.......................................................................................................320 Minimize Update Latency for Replicated Tables.......................................................................................320 Tune FabricServers...................................................................................................................................320 Tuning Disk I/O.........................................................................................................................................321 Running SQLFire in Virtualized Environments..........................................................................................322
Chapter 50: Detecting and Handling Network Segmentation ("Split Brain").323 Part IX: vFabric SQLFire Reference......................................................325 Chapter 51: Configuration Properties..................................................327 Chapter 52: JDBC API............................................................................365 Core JDBC Classes, Interfaces, and Methods.........................................................................................365 Mapping java.sql.Types to SQL Types......................................................................................................370 JDBC Escape Syntax ..............................................................................................................................371
Chapter 53: sqlf Launcher Commands................................................379 sqlf backup................................................................................................................................................381 sqlf compact-all-disk-stores......................................................................................................................383 sqlf compact-disk-store.............................................................................................................................384 sqlf encrypt-password...............................................................................................................................385 sqlf install-jar.............................................................................................................................................387 sqlf list-missing-disk-stores.......................................................................................................................390 sqlf locator................................................................................................................................................391 sqlf Logging Support.................................................................................................................................396 sqlf merge-logs.........................................................................................................................................396 sqlf remove-jar..........................................................................................................................................397 sqlf replace-jar..........................................................................................................................................399 sqlf revoke-missing-disk-store...................................................................................................................401 sqlf run......................................................................................................................................................403 sqlf server.................................................................................................................................................405 sqlf show-disk-store-metadata..................................................................................................................416 sqlf shut-down-all......................................................................................................................................417
ix
sqlf sqlf sqlf sqlf sqlf sqlf sqlf sqlf sqlf sqlf
stats....................................................................................................................................................418 upgrade-disk-store.............................................................................................................................422 validate-disk-store..............................................................................................................................423 version................................................................................................................................................423 write-data-dtd-to-file...........................................................................................................................424 write-data-to-db..................................................................................................................................426 write-data-to-xml................................................................................................................................431 write-schema-to-db............................................................................................................................435 write-schema-to-sql............................................................................................................................439 write-schema-to-xml...........................................................................................................................443
Chapter 54: sqlf Interactive Commands...............................................449 absolute ...................................................................................................................................................449 after last ...................................................................................................................................................450 async .......................................................................................................................................................451 autocommit ..............................................................................................................................................451 before first ................................................................................................................................................452 close ........................................................................................................................................................452 commit .....................................................................................................................................................453 connect ....................................................................................................................................................453 connect client ...........................................................................................................................................454 connect peer ............................................................................................................................................455 describe ...................................................................................................................................................455 disconnect ................................................................................................................................................456 driver ........................................................................................................................................................457 elapsedtime .............................................................................................................................................457 execute ....................................................................................................................................................458 exit ...........................................................................................................................................................459 first............................................................................................................................................................460 get scroll insensitive cursor.......................................................................................................................460 GetCurrentRowNumber ...........................................................................................................................462 help ..........................................................................................................................................................462 last ...........................................................................................................................................................463 LocalizedDisplay.......................................................................................................................................463 MaximumDisplayWidth.............................................................................................................................464 next...........................................................................................................................................................465 prepare ....................................................................................................................................................465 previous....................................................................................................................................................466 protocol.....................................................................................................................................................467 relative......................................................................................................................................................468 remove......................................................................................................................................................468 rollback......................................................................................................................................................469 run.............................................................................................................................................................470 set connection...........................................................................................................................................471 show..........................................................................................................................................................471 wait for......................................................................................................................................................475
x
vFabric SQLFire User's Guide
Contents
Chapter 55: SQLFire API........................................................................477 CredentialInitializer...................................................................................................................................478 UserAuthenticator.....................................................................................................................................478 Procedure Implementation Interfaces.......................................................................................................478 Procedure Result Processor Interfaces....................................................................................................480
Chapter 56: SQL Language Reference.................................................485 Keywords and Identifiers...........................................................................................................................485 SQL Statements.......................................................................................................................................486 SQL Queries.............................................................................................................................................546 SQL Clauses.............................................................................................................................................547 SQL Expressions......................................................................................................................................552 JOIN Operations.......................................................................................................................................565 Built-in Functions......................................................................................................................................567 Built-in System Procedures......................................................................................................................609 Data Types................................................................................................................................................638 SQL Standards Conformance...................................................................................................................649
Chapter 57: System Tables.....................................................................669 ASYNCEVENTLISTENERS.....................................................................................................................669 GATEWAYRECEIVERS............................................................................................................................670 GATEWAYSENDERS...............................................................................................................................671 INDEXES..................................................................................................................................................673 JARS.........................................................................................................................................................673 MEMBERS................................................................................................................................................674 MEMORYANALYTICS...............................................................................................................................675 STATEMENTPLANS.................................................................................................................................676 SYSALIASES............................................................................................................................................677 SYSCHECKS............................................................................................................................................678 SYSCOLPERMS......................................................................................................................................678 SYSCOLUMNS.........................................................................................................................................680 SYSCONGLOMERATES..........................................................................................................................681 SYSCONSTRAINTS.................................................................................................................................682 SYSDEPENDS.........................................................................................................................................682 SYSDISKSTORES....................................................................................................................................683 SYSFILES.................................................................................................................................................684 SYSFOREIGNKEYS.................................................................................................................................684 SYSKEYS.................................................................................................................................................685 SYSROLES...............................................................................................................................................685 SYSROUTINEPERMS..............................................................................................................................687 SYSSCHEMAS.........................................................................................................................................687 SYSSTATEMENTS...................................................................................................................................688 SYSSTATISTICS.......................................................................................................................................688 SYSTABLEPERMS...................................................................................................................................689 SYSTABLES.............................................................................................................................................691
xi
SYSTRIGGERS........................................................................................................................................693 SYSVIEWS...............................................................................................................................................694
Chapter 58: Exception Messages and SQL States..............................695 Chapter 59: ADO.NET Driver Reference...............................................743 SQLFire Data Types in ADO.NET.............................................................................................................743 VMware.Data.SQLFire.BatchUpdateException........................................................................................743 VMWare.Data.SQLFire.SQLFClientConnection.......................................................................................743 VMware.Data.SQLFire.SQLFCommand...................................................................................................748 VMware.Data.SQLFire.SQLFCommandBuilder........................................................................................754 VMware.Data.SQLFire.SQLFType............................................................................................................754 VMware.Data.SQLFire.SQLFDataAdapter...............................................................................................755 VMware.Data.SQLFire.SQLFDataReader................................................................................................757 VMware.Data.SQLFire.SQLFException....................................................................................................759 VMware.Data.SQLFire.SQLFParameter...................................................................................................760 VMware.Data.SQLFire.SQLFParameterCollection...................................................................................760 VMware.Data.SQLFire.SQLFTransaction.................................................................................................761
Chapter 60: vFabric SQLFire Limitations.............................................765 SQL Language Limitations.......................................................................................................................765 ADO.NET Driver Limitations.....................................................................................................................773
Part X: Troubleshooting Common Problems.......................................775 Part XI: vFabric SQLFire Glossary........................................................779
xii
vFabric SQLFire User's Guide
Part
1
About the SQLFire User's Guide Revised February 18, 2014. The SQLFire User's Guide provides step-by-step procedures for installing, configuring, managing, and developing applications with VMware® vFabric™ SQLFire. The guide also provides a complete reference for the SQLFire tools, supported SQL statements, and APIs. Note: All documentation for Pivotal SQLFire 1.1.2 and later is available at http://docs.gopivotal.com/sqlfire/index.html. Intended Audience The SQLFire User's Guide is intended for anyone who wants to install or deploy SQLFire, and for developers who want to program applications that access a SQLFire system and/or implement SQLFire APIs. The guide assumes that you are familiar with SQL databases and database terminology, and that you have experience in developing database applications using Java or .NET technologies.
1
Part
2
Supported Configurations and System Requirements Before installing vFabric SQLFire, make sure your system meets the minimum system requirements for installing and running the product. • Supported Configurations on page 3 • Host Machine Requirements on page 4 • Increase Unicast Buffer Size on Linux Platforms on page 4 • Disable SYN Cookies on Linux Platforms on page 4 • Client Requirements on page 5 Supported Configurations The following table shows all supported configurations for vFabric SQLFire. These include Supported Configurations and System Requirements for vFabric Suite. Note: The tables indicate whether the supported configuration is for production or development. Generally, production support means you can run your production application on the platform; developer support means you can develop on the platform but you should not run your production application on it. Operating System
Processor Architecture
JVM
Production or Developer Support
Red Hat EL 5
x86 (64bit and 32 bit)
Java SE 6, update 0_26
Production
Red Hat EL 6
x86 (64bit and 32 bit)
Java SE 6, update 0_26
Production
Windows Server 2003 R2 SP2 Professional or Enterprise*
x86 (64bit and 32 bit)
Java SE 6, update 0_26
Production
Windows Server 2008 R2 Standard*
x8 (64bit and 32 bit)
Java SE 6, update 0_26
Production
Windows 7 SP1 x8 (64bit and 32 bit) Professional or Enterprise*
Java SE 6, update 0_26
Developer
Windows XP
x86 (64 bit)
Java SE 6
Developer
Ubuntu 10.04
x86 (64 bit)
Java SE 6
Developer
3
*The Microsoft Loopback Adapter is not supported. Note: The SQLFire product download does not include Java; you must download and install a supported JDK for your system. Host Machine Requirements Requirements for each host: • A supported Java SE installation. • File system that supports long file names. • Adequate per-user quota of file handles (ulimit for Linux) • TCP/IP. • System clock set to the correct time. • For each Linux host, the hostname and host files must be properly configured. See the system manpages for hostname and hosts. • Time synchronization service such as Network Time Protocol (NTP). Note: For troubleshooting, you must run a time synchronization service on all hosts. Synchronized time stamps allow you to merge log messages from different hosts, for an accurate chronological history of a distributed run. Increase Unicast Buffer Size on Linux Platforms On Linux platforms, execute the following commands as the root user to increase the unicast buffer size: 1. Edit the /etc/sysctl.conf file to include the following lines: net.core.rmem_max=1048576 net.core.wmem_max=1048576 2. Reload sysctl.conf: sysctl -p
Disable SYN Cookies on Linux Platforms Many default Linux installations use SYN cookies to protect the system against malicious attacks that flood TCP SYN packets. The use of SYN cookies dramatically reduces network bandwidth, and can be triggered by a running SQLFire distributed system. If your SQLFire distributed system is otherwise protected against such attacks, disable SYN cookies to ensure that SQLFire network throughput is not affected. To disable SYN cookies permanently: 1. Edit the /etc/sysctl.conf file to include the following line: net.ipv4.tcp_syncookies = 0 Setting this value to zero disables SYN cookies. 2. Reload sysctl.conf: sysctl -p
4
vFabric SQLFire User's Guide
Client Requirements SQLFire supports two JDBC drivers: a thin client JDBC driver and a peer JDBC driver. SQLFire server instances and the peer driver are supported only with Java SE 6. You can download Java from http://www.oracle.com/technetwork/java/javase/downloads/index.htm. SQLFire provides a managed ADO.NET driver that you can use for developing non-Java client applications. The ADO.NET driver uses IKVM technology to translate SQLFire JDBC core libraries to .NET MSIL. The ADO.NET driver is supported for applications built using the Microsoft .NET 2.0 SP1 or higher framework.
5
Part
3
Getting Started with vFabric SQLFire Getting Started with vFabric SQLFire provides step-by-step procedures for installing, configuring, and using VMware® vFabric™ SQLFire. The guide also explains main concepts and provides tutorials to help you quickly begin using SQLFire.
Topics: • • • • • • • • •
vFabric SQLFire in 15 Minutes Overview of vFabric SQLFire Understanding the SQLFire Distributed System Installing vFabric SQLFire Activating vFabric SQLFire Licensing Upgrading vFabric SQLFire Connect to vFabric SQLFire with JDBC Tools Tutorials vFabric SQLFire Features and Benefits
7
vFabric SQLFire in 15 Minutes Need a quick introduction to vFabric SQLFire? Take this 15-minute tour to try out the basic features and functionality. The vFabric SQLFire tutorials expand on these concepts, and demonstrate additional product features. See Tutorials on page 43. 1. Download the latest SQLFire 1.1 ZIP file distribution from the download page: https://my.vmware.com/web/vmware/info/slug/application_platform/vmware_vfabric_sqlfire/1_0. Save the downloaded file in your home directory. 2. Install SQLFire 1.1 by uncompressing the ZIP file: $ cd ~ $ unzip vFabric_SQLFire_11_bNNNNN.zip Substitute the exact filename that you downloaded. This installs SQLFire in a new vFabric_SQLFire_11_b NNNNN subdirectory in your home directory, where NNNNN is the specific SQLFire build number that you downloaded. 3. If you have not already done so, download and install Java. For a list of Java versions supported with this release of vFabric SQLFire, see Supported Configurations and System Requirements on page 3. 4. Set your PATH environment variable to include the bin subdirectory of the vFabric SQLFire directory. For example: $ export PATH=$PATH:/home/username/vFabric_SQLFire_11_bNNNNN/bin 5. Change to the SQLFire installation quickstart directory, and create three new directories for the locator and two servers that will make up the SQLFire distributed system: $ cd ~/vFabric_SQLFire_11_bNNNNN/quickstart $ mkdir locator1 server1 server2 6. Start the locator: $ sqlf locator start -peer-discovery-address=localhost -dir=locator1 Starting network server for SQLFire Locator at address localhost/127.0.0.1[1527] SQLFire Locator pid: 8787 status: running Logs generated in /home/yozie/vFabric_SQLFire_11_bNNNNN/quickstart/locator1/sqlflocator.log This command starts a default locator that accepts connections on the localhost address. The default port of 10334 is used for communication with other members of the distributed system. (You can double-check that this port is used by examining the locator1/sqlflocator.log file.) All new members of the distributed system must specify this locator's address and peer discovery port, localhost[10334], in order to join the system. The default port of 1527 is used for client connections to the distributed system. 7. Start both servers: $ sqlf server start -locators=localhost[10334] -bind-address=localhost -client-port=1528 -dir=server1 $ sqlf server start -locators=localhost[10334] -bind-address=localhost -client-port=1529 -dir=server2 Starting SQLFire Server using locators for peer discovery: localhost[10334] Starting network server for SQLFire Server at address localhost/127.0.0.1[1528] 8
vFabric SQLFire User's Guide
SQLFire Server pid: 8897 status: running Logs generated in /home/yozie/vFabric_SQLFire_11_bNNNNN/quickstart/server1/sqlfserver.log Starting SQLFire Server using locators for peer discovery: localhost[10334] Starting network server for SQLFire Server at address localhost/127.0.0.1[1529] SQLFire Server pid: 9003 status: running Logs generated in /home/yozie/vFabric_SQLFire_11_bNNNNN/quickstart/server2/sqlfserver.log Both servers also bind to the localhost address. They must specify unique client ports in order to avoid conflicts with the locator's default client port. As an alternative, they could disable the network server entirely by specifying -run-netserver=false, and all clients would need to connect through the locator. 8. Before going any further, check to make sure that you're in the SQLFire quickstart subdirectory. You'll need to run the script files in this directory later in the tutorial, and you must execute those scripts from within the quickstart directory: $ cd ~/vFabric_SQLFire_11_bNNNNN/quickstart 9. Connect to the distributed system as a thin client, and display information about the system members: $ sqlf sqlf> connect client 'localhost:1527'; 10. Now that you're connected to the system, run a simple query to display information about the SQLFire system members: sqlf> select id, kind, netservers from sys.members; ID |KIND |NETSERVERS -----------------------------------------------------------------------------localhost(17355):1374 |locator(normal) |localhost/127.0.0.1[1527] localhost(17535):52946
|datastore(normal)|localhost/127.0.0.1[1529]
localhost(17438):1230
|datastore(normal)|localhost/127.0.0.1[1528]
3 rows selected By default, SQLFire servers are started as datastores, so that they can host database schemas. In this cluster, you can connect as a client to any member by specifying localhost with the unique port number of the member (the one specified in the NETSERVERS column). However, connecting to the locator provides basic load balancing by routing the connection request to an available server member. 11. Create a simple table and insert a few rows: sqlf> create table quicktable (id int generated always as identity, item char(25)); 0 rows inserted/updated/deleted sqlf> insert into quicktable values (default, 'widget'); 1 row inserted/updated/deleted sqlf> insert into quicktable values (default, 'gadget'); 1 row inserted/updated/deleted sqlf> select * from quicktable; ID |ITEM ------------------------------------2 |gadget 1 |widget 2 rows selected
9
12. By default, SQLFire replicates new tables that you create onto data store members. You can validate this using the query: sqlf> select tablename, datapolicy from sys.systables where tablename='QUICKTABLE'; TABLENAME |DATAPOLICY ----------------------------------------------------------------------------QUICKTABLE |REPLICATE 1 row selected 13. Execute two SQL scripts to generate a schema that has both replicated and partitioned tables, and then load the schema with data: sqlf> run 'create_colocated_schema.sql'; sqlf> run 'loadTables.sql'; You see numerous messages as various SQL commands are executed. The first script creates both replicated and partitioned tables, as you can see using the query: sqlf> select tablename, datapolicy from sys.systables where tableschemaname='APP'; TABLENAME |DATAPOLICY -----------------------------------------------------------------------------FLIGHTS_HISTORY |PARTITION FLIGHTAVAILABILITY
|PARTITION
FLIGHTS
|PARTITION
MAPS
|REPLICATE
CITIES
|REPLICATE
COUNTRIES
|REPLICATE
AIRLINES
|REPLICATE
QUICKTABLE
|REPLICATE
8 rows selected 14. To observe a benefit of table partitioning, look at a query plan that involves one of the partitioned tables. Use the EXPLAIN command with a query to generate a query execution plan: sqlf> explain select * from flights; The EXPLAIN command stores the query execution plan for the statement in STATEMENTPLANS system table.
10
vFabric SQLFire User's Guide
15. To view the details of the query plan, disconnect as a thin client from the distributed system, and then reconnect as a peer client. A peer client participates as a member of the SQLFire distributed system and can coordinate queries, but it does not host any actual data. Execute these commands: sqlf> disconnect; sqlf> connect peer 'host-data=false;locators=localhost[10334]'; You can see that your peer client connection introduces a new member to the distributed system: sqlf> select id, kind, netservers from sys.members; ID |KIND |NETSERVERS -----------------------------------------------------------------------------localhost(17355):1374 |locator(normal) |localhost/127.0.0.1[1527] localhost(17438):1230
|datastore(normal)|localhost/127.0.0.1[1528]
localhost(17535):52946
|datastore(normal)|localhost/127.0.0.1[1529]
10.0.1.18(17894):22695
|accessor(normal) |
4 rows selected The term accessor indicates that the member only accesses data, but does not store data for the distributed system. 16. To view the query execution plan that you generated earlier, query the SYS.STATEMENTPLANS table to view the statement ID (STMT_ID), then use EXPLAIN again with the ID to view the plan: sqlf> select stmt_id, stmt_text from sys.statementplans; STMT_ID |STMT_TEXT ------------------------------------------------------------------------------00000001-ffff-ffff-ffff-00020000004c| select * from flights 1 row selected sqlf> explain '00000001-ffff-ffff-ffff-00020000004c'; member localhost(17438):1230 begin_execution 2013-02-27 15:33:30.759 end_execution 2013-02-27 15:33:30.779 QUERY-RECEIVE execute_time 19.440818 ms member_node localhost(17535):52946 RESULT-SEND execute_time 0.130708 ms member_node localhost(17535):52946 RESULT-HOLDER execute_time 10.600616 ms returned_rows 275 no_opens 1 TABLESCAN (100.00%) execute_time 3.250588 ms returned_rows 275 no_opens 1 scan_qualifiers None scanned_object APP.FLIGHTS scan_type HEAP member localhost(17535):52946 begin_execution 2013-02-27 15:33:30.758 end_execution 2013-02-27 15:33:30.89 QUERY-SCATTER execute_time 13.358717 ms member_node localhost(17438):1230,localhost(17535):52946 QUERY-SEND execute_time 3.345079 ms member_node localhost(17438):1230 QUERY-SEND execute_time 1.140207 ms member_node localhost(17535):52946 RESULT-RECEIVE execute_time 0.008155 ms member_node localhost(17535):52946 RESULT-RECEIVE execute_time 1.4887 ms member_node localhost(17438):1230 SEQUENTIAL-ITERATION (35.23%) execute_time 10.463731 ms
11
returned_rows 542 no_opens 1 RESULT-HOLDER execute_time 0.490328 ms returned_rows 267 no_opens 1 member_node localhost(17535):52946 RESULT-HOLDER execute_time 1.65846 ms returned_rows 275 no_opens 1 member_node localhost(17438):1230 DISTRIBUTION-END (64.76%) execute_time 19.233818 ms returned_rows 542 Local plan: member localhost(17535):52946 begin_execution 2013-02-27 15:33:30.762 end_execution 2013-02-27 15:33:30.842 TABLESCAN (100.00%) execute_time 3.297607 ms returned_rows 267 no_opens 1 scan_qualifiers None scanned_object APP.FLIGHTS scan_type HEAP Note: The generated statement ID may be different on your system. Copy the exact ID from the output of the SELECT statement and paste it into the second EXPLAIN statement. The plan describes exactly how SQLFire performed the query. Note the two QUERY-SEND entries. These entries indicate that the results for the query were obtained from each of the two data store members in the distributed system. Because the FLIGHTS table was created as a partitioned table, new rows that are added to the table are uniquely assigned to partitions based on the partitioning key (in this case, the key is the FLIGHT_ID column). Partitions are then placed on data stores, which can process their portion of the data independently when queries are performed against the table. Results from multiple data stores are then merged on a single query coordinator member to provide the final result set. 17. Either continue executing queries against the sample database, or shut down the SQLFire distributed system. To shut down all members of the system, first use the shut-down-all command to stop data stores and accessors. Then shut down any remaining locators: sqlf> quit; $ sqlf shut-down-all -locators=localhost[10334] Connecting to distributed system: locators=localhost[10334] Successfully shut down 2 members $ sqlf locator stop -dir=locator1 The SQLFire Locator has stopped. 18. To continue learning about vFabric SQLFire, read or work through the remaining Tutorials on page 43.
12
vFabric SQLFire User's Guide
Chapter
1
Overview of vFabric SQLFire vFabric SQLFire is a memory-optimized, distributed database management system designed for applications that have demanding scalability and availability requirements. Applications can manage database tables entirely in memory, or they can persist tables to disk to reload the data after restarting the system. A SQLFire distributed system can be easily scaled out using commodity hardware.
13
Getting Started with vFabric SQLFire
Data Flow Diagram
GemFire, Apache Derby, and SQL Components vFabric SQLFire incorporates core vFabric GemFire technology and Apache Derby RDBMS components to provide a high-performance, distributed database management system. SQLFire extends standard SQL statements where necessary for creating and managing tables and configuring the SQLFire system. The sections that follow document how SQLFire utilizes the GemFire and Derby component functionality. vFabric GemFire Technology vFabric SQLFire incorporates the following VMware vFabric GemFire technology: • Reliable data distribution • High performance replication and partitioning • Caching framework • Parallel 'data-aware' application behavior routing
14
vFabric SQLFire User's Guide
Overview of vFabric SQLFire
The SQLFire community site provides a comparison of SQLFire to other data management systems, such as vFabric GemFire. Apache Derby RDBMS Components SQLFire integrates vFabric GemFire functionality with several components of the Apache Derby relational database management system (RDBMS): • JDBC driver. SQLFire supports a native, high performant JDBC driver (peer driver) and a thin JDBC driver. The peer driver is based on the Derby embedded driver and JDBC 4.0 interfaces, but all communication with SQLFire servers is implemented through the vFabric GemFire distribution layer. • Query engine. SQLFire uses Derby to parse the SQL queries and generate parse trees. SQLFire injects its own logic for intermediate plan creation and distributes the plan to data stores in the cluster. SQLFire also capitalizes on some aspects of the built-in optimizer in Derby to generate query plans. The query execution itself uses memory-based indexes and custom storage data structures. When query execution requires distribution, SQLFire uses a custom algorithm to execute the query in parallel on multiple data stores. • Network server. SQLFire servers embed the Derby network server for connectivity from thin JDBC and ADO.NET clients. The communication protocol is based on the DRDA standard that is used by in IBM DB2 drivers. SQL Extensions vFabric SQLFire modifies and extends the query engine and SQL interface to provide support for partitioned and replicated tables, data-aware procedures, data persistence, data eviction, and other features unique to the distributed SQLFire architecture. SQLFire also adds SQL commands, stored procedures, system tables, and functions to help easily manage features of the distributed system, such as persistent disk stores, listeners, and locators.
15
Chapter
2
Understanding the SQLFire Distributed System A SQLFire deployment consists of distributed member processes that connect to each other to form a peer-to-peer network, also known as a distributed system or SQLFire cluster. The sections that follow explain the interactions of main system components and processes. Tutorials on page 43 help you get started configuring and using a SQLFire distributed system.
vFabric SQLFire Members Member processes form a single, logical system, and each member has single-hop access to any other member, with single-hop or no-hop access to data. A SQLFire member is an instance of the SQLFire code that runs in a JVM. A SQLFire member can optionally host data, provide network server functionality for client connections, and provide location services for the distributed system. A SQLFire distributed system is dynamic, and members can be added or removed at any time. The SQLFire implementation guarantees a consistent view of the distributed system to ensure that data consistency and data integrity are not compromised. Most SQLFire members are configured to host data, and are referred to as data stores. Members that are configured to not host data are referred to as accessors. Both data stores and accessors can execute the DDL and DML commands that SQLFire supports. Data stores provide single-hop or no-hop access to data stored that is stored on members of the distributed system. Accessors provide single-hop access to data stores in the distributed system. Data stores and accessors are licensed separately. A third type of member, the standalone locator, does not host data and does not support DDL and DML statements on user-defined tables. You use locators to discover members of the SQLFire cluster. A SQLFire agent is an optional distributed system member that provides access to JMX MBeans for monitoring and managing SQLFire. For more information, see: • Starting and Configuring SQLFire Servers on page 233 • Using Locators on page 229 • Start a SQLFire Peer with the Peer Client JDBC Driver on page 126 • Using a vFabric SQLFire JMX Agent on page 306
17
Getting Started with vFabric SQLFire
Servers, Peer Clients, and Server Groups A SQLFire server is a process that hosts data and is a member of a peer-to-peer distributed system. SQLFire servers run in Java Virtual Machines (JVMs). You start a SQLFire server using the sqlf tool from a command prompt or terminal window. sqlf launches servers as standalone processes that are akin to database servers. The servers can accept TCP connections from thin clients, authenticate credentials, manage sessions, delegate work to a thread pool for SQL processing, and so forth. A peer client, also known as an embedded client, is a SQLFire-aware JDBC client that connects to the distributed system using the JDBC peer driver. Peer clients are always connected to the distributed system, and they have single-hop access to data. A peer client can be configured as a pure client (referred to as an accessor member) or as a client that also hosts data (a data store). Both JDBC peer client processes and server processes are peer members of the distributed system. The members discover each other dynamically through a built-in multicast based discovery mechanism or by using a locator service when TCP discovery is more desirable. Note: In addition to peer client members that participate in a cluster, SQLFire servers support thin client connections from JDBC and ADO.NET thin client drivers. See Developing Java Clients and Peers on page 123 and Developing ADO.NET Client Applications on page 137. SQLFire servers and peer clients that host data (when the host-data property is set to true) are automatically part of the default server group. A server group is a logical grouping of SQLFire server and peer client members that defines the data stores that should host data for table. When any SQLFire object is created, such as a table, in the CREATE TABLE statement you can specify the server group name where the table will be hosted. If no group is specified, the table is hosted in the default server group. Using Server Groups to Manage Data on page 63 provides additional information. For more information, see: • Starting and Configuring SQLFire Servers on page 233 • Using Locators on page 229 • Start a SQLFire Peer with the Peer Client JDBC Driver on page 126 • Using Server Groups to Manage Data on page 63
Discovery Mechanisms A peer member (a server or peer client process) announces itself to the distributed system using one of two mechanisms. SQLFire provides these discovery mechanisms: • Locator (TCP/IP). A locator service maintains a registry of all peer members in the distributed system at any given moment. A locator is typically started as a separate process (with redundancy), but you can also embed a locator in any peer member, such as a SQLFire server. The locator opens a TCP port through which all new members connect to get initial membership information. • UDP/IP Multicast. Members can optionally use a multicast address to broadcast their presence and receive membership notification information. Configuring Discovery Mechanisms on page 229 provides more information.
Group Membership Service The Group Membership Service (GMS) uses self-defined system membership. Processes can join or leave the distributed system at any time. The GMS communicates this information to every other member in the system,
18
vFabric SQLFire User's Guide
Understanding the SQLFire Distributed System
with certain consistency guarantees. Each member in the group participates in membership decisions, which ensures that either all members see a new member or no members see it. The membership coordinator, a key component of the GMS, handles "join" and "leave" requests, and also handles members that are suspected of having left the system. The system automatically elects the oldest member of the distributed system to act as the coordinator, and it elects a new one if the member fails or is unreachable. The coordinator's basic purpose is to relay the current membership view to each member of the distributed system and to ensure the consistency of the view at all times. Because the SQLFire distributed system is dynamic, you can add or remove members in a very short time period. This makes it easy to reconfigure the system to handle added demand (load).The GMS permits the distributed system to progress under conditions in which a statically-defined membership system could not. A static model defines members by host and identity, which makes it difficult to add or remove members in an active distributed system. The system would have to shut down partially or completely to expand or contract the number of members that participate in the system. For more information, see: • Start and Stop SQLFire Servers Using sqlf on page 233 • Connect to a Distributed System Using Locators on page 231 • Rebalancing Partitioned Data on SQLFire Members on page 75
Replicated Tables and Partitioned Tables Tables in SQLFire can be partitioned or replicated. A replicated table keeps a copy of its entire data set locally on every SQLFire server in its server group. A partitioned table manages large volumes of data by partitioning it into manageable chunks and distributing those chunks across all members in the table's server group. By default, all tables are replicated unless you specify partitioning in the CREATE TABLE statement. The schema information for all SQLFire objects is visible at all times to all peer members of the distributed system including peer clients, but excluding standalone locators. Partitioning Tables on page 67 and Replicating Tables on page 77 provide more information.
Parallel Execution of Data-Aware Stored Procedures In a traditional relational database, stored procedures are application routines that are stored as part of the data dictionary and executed on the database system itself. Stored procedures generally offer high performance because they execute in close proximity to data required by the application logic. SQLFire extends this basic stored procedure capability to support parallel execution of application logic on table data that is partitioned across many peers. SQLFire applications can execute stored procedures on specific data hosts, in parallel on all the members of a server group, or can target specific members based on the data requirements for the procedure. Essentially, application behavior that is encapsulated in stored procedures is moved to the process that hosts the associated data set, and it is executed there. If the required data set is spread across multiple partitions, the procedure is executed in parallel on the partition members. Results are streamed to a coordinating member and aggregated for the client invoking the procedure. For example, consider an 'Order' table that is partitioned by its 'customer_id', and an application wanting to execute an expensive 'credit check' for several customers. Assume the credit test requires iteration over all the order history. You can parallelize the execution on all members that manage data for these customers and stream the results to the client. All order history required by each execution is locally available in-process.
// typical procedure call CallableStatement callableStmt = connection.prepareCall("{CALL order_credit_check(?) "); callableStmt.setArray(1, ); 19
Getting Started with vFabric SQLFire
// SQLFire data-aware procedure invocation CallableStatement callableStmt = connection.prepareCall("{CALL order_credit_check() " + "ON TABLE Orders WHERE customerID IN (?)}"); callableStmt.setArray(1, ); // order_credit_check will be executed in parallel on all members where the orders // corresponding to the customerIDs are managed For more information, see: • Using Data-Aware Stored Procedures on page 163 • Using the Procedure Provider API on page 169 • Using the Custom Result Processor API on page 173
Cache Plug-ins for External Data Connections SQLFire is commonly used as a distributed SQL cache in an embedded (peer client) or client-server configuration. It provides a plug-in framework to connect the cache to external data sources such as another relational database. Several cache plug-ins are supported: • Reading from Backend on a Miss ("read through") When SQLFire is used as a cache, applications can configure a SQL RowLoader that is triggered to load the data from a backend repository on a miss in SQLFire. When an incoming query request for a uniquely identified row cannot be satisfied by the distributed cache, the loader is invoked to retrieve the data from an external source. SQLFire locks the associated row and prevents concurrent readers trying to fetch the same row from bombarding the backend database. Note: When SQLFire is used as a "pure" cache (that is, some or all tables are configured with LRU eviction, and only the actively used subset of data is cached in SQLFire), queries on these tables can be based only on the primary key. Only primary key-based queries invoke a configured loader. Any other query potentially could produce inconsistent results from the cache. See Using a RowLoader to Load Existing Data on page 193. • Synchronous Write Back to Backend ("write through") When data is strictly managed in memory, even with replication, a failure of the entire cluster means loss of data. With synchronous write-through, all changes can be synchronously written to the backend repository before the cache is changed. If and only if the write-through succeeds, the data becomes visible in the cache. You configure synchronous write-through with a SQLFire "cache writer". See Handling DML Events Synchronously on page 201. • Asynchronous Write Back to Backend ("write behind") If synchronous writes to the backend are too costly, the application can configure the use of a "write behind cache listener". SQLFire supports several options for how the events are queued, batched, and written to the database of record. It is designed for very high reliability and handles many failure conditions. Persistent queues ensure writes even when the backend database is temporarily unavailable. You can order event delivery or batch it based on a time interval. You can also conflate a certain number of rows and continuous updates to a single update, to reduce the load on the backend database of an 'update heavy' system. See Handling DML Events Asynchronously on page 203.
20
vFabric SQLFire User's Guide
Chapter
3
Installing vFabric SQLFire You can install vFabric SQLFire from the VMware yum repository (RHEL only), from a downloaded RPM file (RHEL only), or from a downloaded ZIP file. The installation procedure varies according to whether you obtained vFabric SQLFire as a standalone product or as part of VMware® vFabric Suite™ Advanced.
Installation Note for vFabric Suite Customers vFabric SQLFire is available as a standalone product installation, and as part of vFabric Suite Advanced. vFabric Suite is a set of runtime components that let you build, scale, and run modern, agile applications in virtual environments. If you obtain SQLFire standalone, you can install it on physical or virtual machines according to the procedures in this document. If you obtain SQLFire as part of vFabric Advanced, you install it exclusively on VMware virtual machines that run on vSphere. The vFabric Suite install process adds the vFabric License Server to an existing vCenter server. The vFabric License Server accepts a network license, actually a pool of licenses for vFabric components, which makes it easy to manage the licensing of all components installed on the VMs. If you obtain SQLFire as part of vFabric Suite Advanced, first complete the license activation and installation procedures in Getting Started with vFabric Suite . Then follow procedures in this document to set up your environment for SQLFire and complete any remaining SQLFire-specific installation and configuration tasks.
RHEL Only: Install vFabric SQLFire from an RPM If your operating system is Red Hat Enterprise Linux (RHEL), you can install vFabric SQLFire from an RPM. The RPM installation process can be used whether you have purchased the standalone vFabric SQLFire product or vFabric Suite Advanced, which includes vFabric SQLFire. Use the installation procedure that matches the product(s) that you purchased: • Install SQLFire as a Component of vFabric Suite Advanced on page 22 • Install SQLFire as a Standalone Product on page 24 See Getting Started with vFabric Suite Suite for more information. Prerequisites • Confirm that your system meets the hardware and software requirements described in Supported Configurations and System Requirements on page 3. • If you have not already done so, download and install a compatible JDK or JRE on the RHEL computer or VM.
21
Getting Started with vFabric SQLFire
Install SQLFire as a Component of vFabric Suite Advanced If you purchased vFabric Suite Advanced, VMware recommends that you install vFabric SQLFire by first installing the VMware RPM repository and then using yum to perform the actual installation. Follow this procedure: 1. On the RHEL computer, start a terminal either as the root user or as an unprivileged user who has sudo privileges. Note: If you are not logged in as the root user, you must use the sudo command to run the commands in the following steps. 2. Install the vFabric repository RPM using the following wget command, passing it the appropriate URL. The URL differs depending on the version of RHEL you are using. Note: You must run the entire wget command on a single line. Be sure you include | sh at the end, or the RPM installation will not work. For RHEL 5: wget -q -O http://repo.vmware.com/pub/rhel5/vfabric/5.3/vfabric-5.3-suite-installer | sh For RHEL 6: wget -q -O http://repo.vmware.com/pub/rhel6/vfabric/5.3/vfabric-5.3-suite-installer | sh The command performs the following tasks: • Imports the vFabric GNU Privacy Guard (GPG) key. • Installs the vFabric 5.3 repository RPM. • Launches the VMware End User License Agreement (EULA) acceptance and repository configuration script. • Outputs the EULA for you to read; you must answer yes to accept the terms and continue. 3. Use the yum search vfabric or yum search vmware command to view the list of vFabric components that you can install from the VMware repository. For example (output truncated for clarity): yum search vfabric ... ============================= N/S Matched: vfabric ============================= vfabric-5.3-qa-repo.noarch : vFabric 5.3 internal qa repository configuration vfabric-agent.x86_64 : VMware vFabric Agent vfabric-gemfire.noarch : VMware vFabric GemFire vfabric-hyperic-agent.noarch : VMware vFabric Hyperic Agent vfabric-hyperic-server.x86_64 : VMware vFabric Hyperic Server vfabric-insight-dashboard.noarch : com.springsource.insight.dist.vfabric:dashboard-rpm vfabric-license-server.noarch : VMware vFabric License Server vfabric-sqlfire.noarch : VMware vFabric SQLFire ... The vFabric SQLFire RPM is called vfabric-sqlfire. 4. Execute the yum command to install vFabric SQLFire: yum install vfabric-sqlfire 22
vFabric SQLFire User's Guide
Installing vFabric SQLFire
Note: The yum install command installs the most recent version of the vFabric SQLFire RPM that it finds in all installed repository. If you want to install a different version, you must explicitly specify that version with the yum install command. Use yum search vfabric-sqlfire --showduplicates to find all versions that are available in the installed repositories. The yum command begins the install process, resolves dependencies, and displays the packages it plans to install and the required space. 5. Enter y at the prompt to begin the actual installation. The installer downloads and installs the package, andn displays Complete! when the installation process is finished. 6. (Optional.) Specify that the vfabric-sqlfire process should automatically start when the operating system starts by running the following command: chkconfig --level 35 vfabric-sqlfire on 7. (Optional.) Specify the configuration of the vfabric-sqlfire process by editing the file /etc/sysconfig/sqlfire, which is the file sourced by the script that you will later use to start the SQLFire process (/etc/init.d/vfabric-sqlfire.) The /etc/sysconfig/sqlfire file includes many comments to help you decide whether you need to modify it. Here are additional pointers: • If you do not modify the /etc/sysconfig/sqlfire file but simply use the one installed by default, the vfabric-sqlfire process starts up a server instance in a multicast configuration. • If you want the vfabric-sqlfire process to start up using a locator-based configuration, change the LOCATOR property in the /etc/sysconfig/sqlfire file to local, as shown: LOCATOR=local This configuration allows a local locator process to start along with a local server instance. To add additional remote locators, add their IP address and port to the LOCATOR_IP_STRING as shown in the configuration file as a commented-out example. • If you want to start up only a local locator process and not a local server instance, set the LOCATOR property to locator_only. This sets up a redundant locator configuration; be sure you add the locator IP addresses and port numbers to the LOCATOR_IP_STRING; an example is shown in the configuration file. • Finally, set the LOCATOR property to remote if you want to start a local server instance that relies on having locator processes running on one or more remote hosts. Specify these remote hosts using the LOCATOR_IP_STRING property. 8. Start the processes associated with SQLFire by running the following command: /etc/init.d/vfabric-sqlfire start By default, the process uses an evaluation license; if you have purchased a production license, see Activating vFabric SQLFire Licensing on page 27 for information about configuring it in the /var/opt/vmware/sqlfire/sqlfire.properties file. The RPM installation process creates a skeleton sqlfire.properties file to get you started. To stop, restart, and get status about the processes, pass the stop, restart, and status parameters, respectively, to the /etc/init.d/vfabric-sqlfire script: /etc/init.d/vfabric-sqlfire status
23
Getting Started with vFabric SQLFire
Install SQLFire as a Standalone Product If you purchased the standalone product (rather than as part of vFabric Suite Advanced), follow this procedure to download and install only the SQLFire RPM: 1. From the VMware downloads page, select VMware vFabric SQLFire. On the SQLFire 1.1 download page, perform the following steps: • If you are installing SQLFire for evaluation, click on the "Download Free Trial" link, register an account with VMware (if you have not done so already) and download SQLFire. • If you have already purchased SQLFire, locate the vFabric SQLFire licensed offering that you have purchased from the Product Downloads tab. Click on the View Download button. • Download the vFabric SQLFire RPM appropriate for your RHEL operating system. • RHEL 5: vfabric-sqlfire-1.1-1.el5.noarch.rpm • RHEL 6: vfabric-sqlfire-1.1-1.el6.noarch.rpm 2. On the RHEL computer, start a terminal either as the root user or as an unprivileged user who has sudo privileges. Note: If you are not logged in as the root user, you must use the sudo command to run the commands in the following steps. 3. Run the appropriate command to install the vFabric repository RPMs: RHEL 5: sudo rpm -ivh vfabric-sqlfire-1.1-1.el5.noarch.rpm RHEL 6: sudo rpm -ivh vfabric-sqlfire-1.1-1.el6.noarch.rpm The rpm command begins the install process, resolves dependencies, and displays the packages it plans to install. The SQLFire software is installed into the /opt/vmware/vfabric-sqlfire directory. If necessary, the install process creates a sqlfire non-interactive user in the vfabric group. This user owns the installed SQLFire directories and files, including any disk store files that you create later. Note: You cannot log in directly as the sqlfire user because interactive login has been disabled. Rather, you must first log in as the root user or as a user with sudo privileges. You can then execute commands as the sqlfire user by using sudo -u sqlfire command_name . 4. If the installation is successful, you see a [100%] installation status message. For example: Preparing... [100%] 1:vfabric-sqlfire [100%]
########################################### ###########################################
5. If you have not already done so, download and install a compatible JDK or JRE on the RHEL computer or VM. 6. (Optional.) Specify that the vfabric-sqlfire process should automatically start when the operating system starts by running the following command: chkconfig --level 35 vfabric-sqlfire on
24
vFabric SQLFire User's Guide
Installing vFabric SQLFire
7. (Optional) Specify the configuration of the vfabric-sqlfire process by editing the file /etc/sysconfig/sqlfire, which is the file sourced by the script that you will later use to start the SQLFire process (/etc/init.d/vfabric-sqlfire.) The /etc/sysconfig/sqlfire file includes many comments to help you decide whether you need to modify it. Here are additional pointers: • If you do not modify the /etc/sysconfig/sqlfire file but simply use the one installed by default, the vfabric-sqlfire process starts up a server instance in a multicast configuration. • If you want the vfabric-sqlfire process to start up using a locator-based configuration, change the LOCATOR property in the /etc/sysconfig/sqlfire file to local, as shown: LOCATOR=local This configuration allows a local locator process to start along with a local server instance. To add additional remote locators, add their IP address and port to the LOCATOR_IP_STRING as shown in the configuration file as a commented-out example. • If you want to start up only a local locator process and not a local server instance, set the LOCATOR property to locator_only. This sets up a redundant locator configuration; be sure you add the locator IP addresses and port numbers to the LOCATOR_IP_STRING; an example is shown in the configuration file. • Finally, set the LOCATOR property to remote if you want to start a local server instance that relies on having locator processes running on one or more remote hosts. Specify these remote hosts using the LOCATOR_IP_STRING property. 8. Start the processes associated with SQLFire by running the following command: /etc/init.d/vfabric-sqlfire start By default, the process uses an evaluation license; if you have purchased a production license, see Activating vFabric SQLFire Licensing on page 27 for information about configuring it in the /var/opt/vmware/sqlfire/sqlfire.properties file. The RPM installation process creates a skeleton sqlfire.properties file to get you started. 9. To stop, restart, and get status about the processes, pass the stop, restart, and status parameters, respectively, to the /etc/init.d/vfabric-sqlfire script: /etc/init.d/vfabric-sqlfire status
Install vFabric SQLFire from a ZIP File This procedure describes how to install the SQLFire software on a single computer or VM, for either a peer client or client/server deployments. Repeat the procedure to install and license SQLFire on each physical or virtual machine where you want to run a SQLFire member. Prerequisites 1. Confirm that your system meets the hardware and software requirements described in Supported Configurations and System Requirements on page 3. 2. From the VMware downloads page, select VMware vFabric SQLFire. From the Download VMware vFabric SQLFire 1.1 page, perform the following steps: • If you are installing SQLFire for evaluation, click on the "Download Free Trial" link, register an account with VMware (if you have not done so already) and download SQLFire. • If you have already purchased SQLFire, locate the vFabric SQLFire licensed offering that you have purchased from the Product Downloads tab. Click on the View Download button. Note: You can also get SQLFire from your salesperson.
25
Getting Started with vFabric SQLFire
• Download the ZIP file distribution of vFabric SQLFire. Procedure 1. Change to the directory where you downloaded the SQLFire software, and unzip the ZIP file: • UNIX and Linux (Bourne and Korn shells - sh, ksh, bash). If you are using the command line, type the following command: $ unzip vFabric_SQLFire_XX_bNNNNN.zip -d path_to_product where XX corresponds to the product version of SQLFire that you are installing, bNNNNN corresponds to the build number of the software, and path_to_product corresponds to the location where you want to install SQLFire. For example: $ unzip vFabric_SQLFire_11_b40332.zip -d /opt/vmware/vfabric-sqlfire Alternatively, unzip the .zip file directly with any common ZIP extraction tool suitable for your operating system. • Windows: Open the .zip file, and extract the product files to the desired installation location on your machine. 2. Configure your SQLFire license, if you have purchased a production license. Activating vFabric SQLFire Licensing on page 27 provides more information. 3. To begin using the sqlf utility to start servers or execute other commands, add the SQLFire bin directory to your path. For example: export PATH=$PATH:/opt/vmware/vfabric-sqlfire/bin The sqlf script automatically sets the class path relative to the installation directory. 4. Repeat this procedure to install and license SQLFire on each different computer where you want to run a SQLFire member. Obtaining Modified Open Source Code Libraries Many open source licenses require that vendors who use or modify their libraries make that code available. To obtain the open source code libraries modified in SQLFire, visit the product download page, and select the Open Source tab. Download the vFabric_SQLFire_XX_bNNNNN_OSS.zip file (where XX corresponds to the product version of SQLFire that you are installing and bNNNNN corresponds to the build number of the software.) Download and read the associated disclosure information (.txt) file from the same download page. The .zip file contains both the original open source libraries and the modified source libraries.
26
vFabric SQLFire User's Guide
Chapter
4
Activating vFabric SQLFire Licensing vFabric SQLFire includes a default evaluation license that enables you to run the product tutorial and perform simple evaluation activities. You can obtain custom evaluation licensing and purchase production licensing from your VMware account manager or the VMware license portal.
Understand vFabric SQLFire License Options vFabric SQLFire has a number of licensing options. Read the following sections to understand the different license types. Default Evaluation License The default evaluation license included with SQLFire allows you to run up to three non-production data stores in the distributed system. SQLFire uses the default evaluation license upon startup if custom evaluation or production licensing has not been installed. This license never expires. It is not allowed for production use. Custom Evaluation and Production Licenses To get expanded configuration options and to move into production with SQLFire, contact your VMware account manager. You can obtain custom evaluation licenses for your specific evaluation requirements. You can purchase a production license when you are finished with your evaluation. Your VMware account manager will help you determine the best topology and expiration options for your needs. For information on purchasing SQLFire licensing, see http://www.vmware.com/products/datacenter-virtualization/vfabric-sqlfire/buy.html. Custom evaluation licenses have an expiration date. Production licenses never expire. vFabric SQLFire Professional and Enterprise Licenses vFabric SQLFire uses these main license types for custom evaluation and production licensing: • vFabric SQLFire Professional license. Enables you to deploy a SQLFire distributed system in a production environment. A license is required for each SQLFire distributed system member that hosts data, whether or not they also act as a server in client/server deployments. • vFabric SQLFire Enterprise license. Standalone license that enables you to deploy a SQLFire distributed system in a production environment. This license also enables you to loosely couple SQLFire distributed systems that are at different geographical locations (WAN replication). A license is required for each SQLFire distributed system member that hosts data, whether or not they also act as a server in client/server deployments. Because the vFabric SQLFire Enterprise license is available only as a standalone license and not as part of vFabric Suite, you must install the license in the sqlfire.properties file or as a boot property. You cannot install a vFabric SQLFire Enterprise license in the vFabric License Server. If you install SQLFire using the RPM, this file is located at /var/opt/vmware/sqlfire/sqlfire.properties.
27
Getting Started with vFabric SQLFire
SQLFire licenses do not restrict the number of clients that can connect to a distributed system. Any number of thin clients and peer clients (SQLFire accessor members) are permitted. vFabric SQLFire Local Licensing and vFabric Suite Licensing If you obtained vFabric SQLFire as a standalone product, you install a license locally on each physical and virtual machine that runs vFabric SQLFire. If you obtained vFabric SQLFire as part of vFabric Suite, you can either use the vFabric License Server to install your SQLFire licenses, or install the license locally in vSphere virtual machines. To use the vFabric License Server for licensing, you must first install and configure the vFabric License Server. Then you install your SQLFire serial numbers in the vFabric License Server. SQLFire will retrieve your licenses dynamically. To set up the vFabric License Server, see Install vFabric License Server on Your vCenter Server in the vFabric Suite documentation. Note: You cannot use the vFabric License Server to manage licenses on physical machines, or with standalone SQLFire licenses such as vFabric SQLFire Enterprise.
Choose a License Option Based on Topology How you plan to use vFabric SQLFire determines which SQLFire license offerings are right for your system. Licensing options are available for the following topologies: • Single-Site System with Multiple Data Stores on page 28 • Multi-Site (WAN) System on page 29 For further information about SQLFire purchase options, refer to http://www.vmware.com/products/datacenter-virtualization/vfabric-sqlfire/buy.html. Single-Site System with Multiple Data Stores Any client/server system or peer-to-peer system requires a vFabric SQLFire license for the SQLFire data stores (SQLFire members that host data). You configure these licenses for each SQLFire member. The vFabric SQLFire Professional license enables you to run multiple data stores in a single cluster, with unlimited clients.
Figure 1: Example vFabric SQLFire Professional Deployment If you need to replicate table data across multiple SQLFire clusters, you must purchase and install a standalone vFabric SQLFire Enterprise license. See Multi-Site (WAN) System on page 29.
28
vFabric SQLFire User's Guide
Activating vFabric SQLFire Licensing
Installing a vFabric SQLFire Professional License 1. Obtain a product vFabric SQLFire Professional license from VMware. 2. Install the license in SQLFire. Choose one of the following options: • On each SQLFire data store member, add the vFabric SQLFire license serial number directly to the sqlfire.properties file. See Option 1: Install Licenses Using sqlfire.properties on page 31; or • Create a file that contains the vFabric SQLFire license serial number, and copy the file to the vFabric serial number directory on each SQLFire data store member. See Option 2: Install Licenses Using Serial Number Files on page 32; or • If you have purchased SQLFire as part of vFabric Suite, install the vFabric SQLFire license using vFabric License Server. See vFabric Suite Only: Configure vFabric SQLFire for vFabric License Server on page 32. Multi-Site (WAN) System If you need to replicate table data across multiple SQLFire clusters (for example, over a WAN), you must purchase and install a standalone vFabric SQLFire Enterprise license. Any number of thin clients and peer clients (SQLFire accessor members) are permitted in each cluster. All systems that participate in the WAN installation must be licensed. The license is configured in each data store member in the distributed system. Any number of thin clients and peer clients (SQLFire accessor members) are permitted in each cluster.
Installing a vFabric SQLFire Enterprise License 1. Obtain a vFabric SQLFire Enterprise license. You can use the same serial number in each distributed system; however, as a best practice you should obtain a unique serial number for each distributed system that communicates over the WAN. 29
Getting Started with vFabric SQLFire
2. Install the license in SQLFire. Choose one of the following options: • On each data store member in the distributed system, add the vFabric SQLFire Enterprise serial number directly to the sqlfire.properties file. See Option 1: Install Licenses Using sqlfire.properties on page 31; or • Create a file that contains both the vFabric SQLFire Enterprise serial number, and copy the file to the vFabric serial number directory on all SQLFire data store members. See Option 2: Install Licenses Using Serial Number Files on page 32. 3. If you have obtained unique serial numbers for your other distributed systems, install the unique serial numbers on the other distributed systems by using one of the options described in step 2.
How vFabric SQLFire Manages Licensing Before you install vFabric SQLFire licenses, understand how SQLFire manages your licensing information. • How vFabric SQLFire Finds and Verifies Your License on page 30 • License Working Directory on page 30 • Local VMware vFabric Directories on page 31 How vFabric SQLFire Finds and Verifies Your License SQLFire has a default license that it uses if it cannot locate any other valid licensing. Non-default licensing is verified using a combination of the product sqlfire.jar file and the serial numbers you provide, or a license may be acquired dynamically from the vFabric License Server. SQLFire uses the first valid licensing it finds in this list: 1. Serial number specified by the license-serial-number boot property. 2. Dynamic licensing provided from a serial number file stored in the local VMware vFabric serial number directory, configured by specifying license-serial-number=dynamic as a boot property. 3. Dynamic licensing provided by the vFabric License Server, configured by specifying license-serial-number=dynamic as a boot property. If SQLFire cannot validate any of the specified licensing in the above list, the member process does not start and throws an exception. If no licensing is specified, SQLFire uses the default evaluation licensing shipped with the product. License Working Directory SQLFire stores licensing information in a directory on your system. It writes to the first writable directory it finds in this list: 1. The value of the license-working-dir boot property, if specified for the member. 2. The SQLFire member's current working directory as determined by System.getProperty("user.dir") at startup. These are the files that SQLFire writes: 1. License state files with names ending with -license.cfg. Example: vf.sf-license.cfg. 2. License events files with names ending with -events.txt. Example: vf.sf-events.txt. Note: Leave these files alone. Do not edit or delete these files, and do not alter the permissions on the files or the directory where these files are located. These files are created using the default permissions of the user who is starting up the SQLFire process. To stop or start the SQLFire process, a user needs write permissions for this directory and the files in it. Make sure that the permissions for each user are sufficient; otherwise SQLFire may throw an exception during stop or start.
30
vFabric SQLFire User's Guide
Activating vFabric SQLFire Licensing
Local VMware vFabric Directories The location of the local VMware vFabric home directory, if it exists, varies by operating system: Windows
%ALLUSERSPROFILE%\VMware\vFabric
Linux (or other OS)
/opt/vmware/vFabric
The location of the local VMware vFabric serial number directory, if it exists, varies by operating system: Windows
%ALLUSERSPROFILE%\VMware\vFabric
Linux (or other OS)
/etc/opt/vmware/vfabric
Install and Configure vFabric SQLFire Licenses Installation is required for all licensing except the default evaluation license, which is used automatically when other licensing is not provided. You do not need to install or configure the default evaluation license. When you obtain custom evaluation licenses or purchase production licenses, you receive one or more serial numbers to use in your SQLFire member configurations. The number of serial numbers that you receive depends on the type of licensing that you purchase. You install licensing in all members that you run as data stores. You do not need to install licensing in your clients (thin client applications or SQLFire accessors). Choose a License Option Based on Topology on page 28 describes the appropriate licensing for each topology. Note: If you obtain SQLFire as part of vFabric Suite, refer first to the license activation procedure in Getting Started with vFabric Suite in the vFabric Suite documentation. Then complete the tasks in this procedure, as necessary, to configure SQLFire-specific licensing. Where to Install Your License If you obtained vFabric SQLFire as a standalone product, you install licenses locally to each SQLFire member by modifying the sqlfire.properties file, or by supplying a boot property when you start the SQLFire member. See Option 1: Install Licenses Using sqlfire.properties on page 31 and Option 2: Install Licenses Using Serial Number Files on page 32. If you are installing SQLFire in a vSphere virtual machine as part of vFabric Suite, you can use the vFabric License Server to install your SQLFire licenses and then configure vFabric SQLFire to communicate with the license server. See vFabric Suite Only: Configure vFabric SQLFire for vFabric License Server on page 32. Note: You cannot use the vFabric License Server if you purchased a standalone SQLFire license, such as vFabric SQLFire Enterprise. Install standalone licenses in sqlfire.properties or as a boot property. Option 1: Install Licenses Using sqlfire.properties Add a license key to the sqlfire.properties file on each data store member in your deployment, or specify the keys as a boot property when you start a SQLFire member using sqlf or the FabricServer interface. Note: If you install vFabric SQLFire using an RPM, the properties file is located at /var/opt/vmware/sqlfire/sqlfire.properties. This is the recommended way to install licensing for standalone SQLFire deployments. (For SQLFire deployments installed on vSphere virtual machines as part of a vFabric Suite deployment, VMware recommends that you use
31
Getting Started with vFabric SQLFire
the vFabric License Server. See vFabric Suite Only: Configure vFabric SQLFire for vFabric License Server on page 32.) For example: # sqlfire.properties for data store or accessor member license-serial-number=#####-#####-#####-#####-##### If you need to specify multiple serial numbers, use a comma separated list: # sqlfire.properties for data store or accessor member license-serial-number=#####-#####-#####-#####-#####,#####-#####-#####-#####-##### Option 2: Install Licenses Using Serial Number Files Place serial numbers in serial number files in the VMware vFabric serial number directory. To configure SQLFire to use these for licensing, you specify license-serial-number=dynamic as a boot property. You can specify this property in sqlfire.properties, or when starting a SQLFire member using sqlf or the FabricServer interface. Use this procedure to install and configure serial number files for any standalone SQLFire license. 1. Create a file named vf.sf-serial-numbers.txt and paste the serial number into it. Note: If you have multiple serial numbers, enter each serial number on a new line. 2. Save the file to the appropriate serial numbers directory on each SQLFire member. See Local VMware vFabric Directories on page 31 for the appropriate directory based on your operating system. 3. On all SQLFire data store members, specify license-serial-number=dynamic in sqlfire.properties, or as a boot property. For example: # sqlfire.properties for dynamic licensing of data stores license-serial-number=dynamic
vFabric Suite Only: Configure vFabric SQLFire for vFabric License Server You can only use this configuration method if you are running SQLFire on a vSphere virtual machine as part of vFabric Suite. This method is not applicable to installing a standalone vFabric SQLFire license, such as a vFabric SQLFire Enterprise license. To set up vFabric License Server licensing, refer first to the license activation procedure in Activate vFabric Suite Licenses in the vFabric Suite documentation. After you install SQLFire licenses on the vFabric License Server, configure dynamic licensing in SQLFire: 1. Specify license-serial-number=dynamic in the sqlfire.properties file on each data store or accessor member. For example: # sqlfire.properties for dynamic licensing license-serial-number=dynamic Note: If you install vFabric SQLFire using an RPM, the properties file is located at /var/opt/vmware/sqlfire/sqlfire.properties. 2. (Optional) Change the default timeout value (10000) to indicate the maximum time in milliseconds that the member should wait when obtaining a license dynamically from the vFabric License Server. For example, in sqlfire.properties: #timeout in milliseconds license-server-timeout=20000 32
vFabric SQLFire User's Guide
Activating vFabric SQLFire Licensing
Verify Your License and Check Your License Usage You can verify the validity of your license and monitor your license usage. If you install an invalid serial number or if SQLFire cannot obtain a dynamic license from the vFabric License Server, SQLFire fails to start and throws an exception. In Getting Started with vFabric Suite , see "Check the Validity of an Existing License Key." to learn how to verify your license. See Licensing Problems on page 775 for a list of common problems and their solutions. If you obtained vFabric SQLFire as part of vFabric Suite and want to check your license usage, see Monitoring vFabric License Usage in the vFabric Suite documentation.
33
Chapter
5
Upgrading vFabric SQLFire SQLFire 1.1 is not compatible with persistence files (disk stores) that were created in earlier versions of SQLFire (1.0.x). You must use the sqlf upgrade-disk-store command to upgrade disk stores to the SQLFire 1.1 format after upgrading the SQLFire software to version 1.1. Refer to the Release Notes to learn about changes that were introduced in the latest update. Note: If you installed SQLFire using an RPM from the VMware YUM repository, see RHEL Only: Upgrade vFabric SQLFire from the VMWare YUM RepositoryWhen VMware releases a new maintenance version of SQLFire, the appropriate VMware YUM repository is updated to include the new RPM. You can use yum to quickly upgrade SQLFire to the latest version. for upgrade instructions.
Before You Upgrade This section provides information you need to know before you begin to upgrade vFabric SQLFire. • If you obtained SQLFire as part of a vFabric Suite package, first complete the vFabric License Server installation and license activation procedures in Getting Started with vFabric Suite.Then follow procedures in this document to set up your environment for SQLFire and to complete any remaining SQLFire-specific installation and configuration tasks. • During an upgrade from one major SQLFire version to another, you cannot connect members in the same distributed system to other members running a different major version. Therefore, you cannot perform a rolling upgrade between major versions of vFabric SQLFire and will need to plan your system downtime accordingly. See Version Compatibility Rules on page 39 for more information on SQLFire version interoperability. • If your SQLFire deployment uses DBSynchronizer, failover-related issues (such as duplicate rows in the RDBMS) can occur during an upgrade if clients are actively performing DML on synchronized tables. See How Failover and Upgrades Affect Synchronization on page 209 in the DBSynchronizer documentation. • Read the SQLFire Release Notes to familiarize yourself with all feature changes. • Know how to configure environment variables for your system. • Confirm that your system meets the requirements to run SQLFire. See Supported Configurations and System Requirements on page 3. Note: To check your current Java version, type java -version at a command-line prompt. You can download Sun/Oracle Java SE from the following location: http://www.oracle.com/technetwork/java/javase/downloads/index.html. • If you are upgrading from SQLFire 1.0.x you will need to run the sqlf command upgrade-disk-store to upgrade each disk store to a format that is compatible with vFabric SQLFire 1.1 See the upgrade instructions for ZIP files or RPMs for more information.
35
Getting Started with vFabric SQLFire
RHEL: Upgrade vFabric SQLFire from RPM If your guest operating system is Red Hat Enterprise Linux (RHEL) and you have installed a previous version of SQLFire using yum and RPM, VMware recommends that you use the RPM distribution to upgrade vFabric SQLFire. You complete the upgrade procedure on every virtual and physical machine that runs SQLFire. Note: Thoroughly test your development systems with the new version before moving into production. When you upgrade vFabric SQLFire on RHEL from RPM, the new software is installed by default into /opt/vmware/vfabric-sqlfire/vFabric_SQLFire_XX where XX corresponds to the version of SQLFire (for example, vFabric_SQLFire_11) that you have installed. No files are overwritten during the upgrade process. Prerequisites • Confirm that your system meets the hardware and software requirements described in Supported Configurations and System Requirements on page 3. • Download the vFabric SQLFire RPM: • From the VMware downloads page, select VMware vFabric SQLFire. • If you are installing SQLFire for evaluation, click the Try Now button to download an evaluation RPM. • If you have purchased SQLFire, download the vFabric SQLFire offering you have purchased from the Product Downloads tab. You can also get SQLFire from your salesperson. • Download the vFabric SQLFire RPM appropriate for your RHEL operating system. • RHEL 5: vfabric-sqlfire-1.1-1.el5.noarch.rpm • RHEL 6: vfabric-sqlfire-1.1-1.el6.noarch.rpm
Procedure Upgrade vFabric SQLFire using the RPM: 1. Review the items listed in Prerequisites on page 36 and make any appropriate preparations. 2. Stop all members of the system running with the prior version. For example, use the shut-down-all command: sqlf shut-down-all -locators=localhost[10101] In the sample command, substitute the address and port of a locator for your SQLFire distributed system. To shut down only the local SQLFire process, use the command: /etc/init.d/vfabric-sqlfire stop 3. Execute the following rpm command to install the new SQLFire RPM. If necessary, use sudo to run the command if you are not logged in as root: RHEL 5: sudo rpm -Uvh vfabric-sqlfire-1.1-1.el5.noarch.rpm RHEL 6: sudo rpm -Uvh vfabric-sqlfire-1.1-1.el6.noarch.rpm The rpm command begins the install process, resolves dependencies, and displays the packages it plans to install.
36
vFabric SQLFire User's Guide
Upgrading vFabric SQLFire
4. If you are upgrading from SQLFire 1.0.x, run the sqlf command upgrade-disk-store to upgrade each disk store to a format that is compatible with vFabric SQLFire 1.1. See sqlf upgrade-disk-store on page 422 for more information. Note: You must update the default disk stores as well as any additional disk stores that were defined through the CREATE DISKSTORE statement. Specify the full path to each disk store in multiple invocations of the sqlf upgrade-disk-store command. SQLFire creates default disk stores in each SQLFire server or locator directory, as well as in the /datadictionary subdirectory of each SQLFire server or locator directory. The example below shows how to upgrade these default disk stores on a single member; you must repeat the commands on each machine that hosts a locator or server. By default, SQLFire disk store files are owned by the user sqlfire, and you must preserve the ownership of these files when upgrading the disk store. Use the sudo command to perform the upgrade commands as the sqlfire user, as in these example commands: sudo -u sqlfire sqlf upgrade-disk-store SQLF-DEFAULT-DISKSTORE /var/opt/vmware/sqlfire/server sudo -u sqlfire sqlf upgrade-disk-store SQLF-DD-DISKSTORE /var/opt/vmware/sqlfire/server/datadictionary Replace /var/opt/vmware/sqlfire/server with the actual server or locator directory. 5. 6. 7. 8.
Repeat the previous step to upgrade each default and user-defined disk store on the local system. Repeat all previous steps for every virtual or physical machine on which you run a vFabric SQLFire member. Restart all system members according to your usual procedures. Beginning with version 1.1.1, SQLFire stores normalized (all-uppercase) user names. The normalized user names are also used for password encryption. If you configured users in the previous release and the user names were not in all-uppercase letters, also complete these steps: a. Use the sqlf encrypt-password command to regenerate encrypted passwords for any system users that were defined in the previous software release using mixed-case characters. b. Use SYS.CHANGE_PASSWORD to regenerate passwords for all other users that were created using mixed-case characters.
What to Do Next After you have upgraded, perform the following tasks: • Run the product tutorial and examples. See Tutorials on page 43. • Test your development systems with the new version.
Upgrade vFabric SQLFire from a ZIP File Upgrade vFabric SQLFire on every virtual and physical machine that will run SQLFire, using the downloaded SQLFire ZIP file. Procedure Note: Thoroughly test your development systems with the new version before moving into production. 1. Review the items listed in Before You Upgrade on page 35 and make any appropriate preparations.
37
Getting Started with vFabric SQLFire
2. Stop all members of the system running the prior version of SQLFire. For example, in a SQLFire deployment you can use the shut-down-all command to stop all members of the distributed system: sqlf shut-down-all -locators=mylocator[10101] 3. Install the latest version of SQLFire in a different directory than the existing version. See Install vFabric SQLFire from a ZIP File on page 25. Note: SQLFire is installed as a complete product, rather than as a modification to a prior version. The Java JRE runtime environment is not bundled with SQLFire, so you need to install and configure an appropriate JDK or JRE to comply with SQLFire requirements and your unique system needs. 4. If you are upgrading from SQLFire 1.0.x, run the sqlf command upgrade-disk-store to upgrade each disk store to a format that is compatible with vFabric SQLFire 1.1. See sqlf upgrade-disk-store on page 422 for more information. Note: Update the default disk stores as well as any additional disk stores that were defined through the CREATE DISKSTORE statement. Specify the full path to each disk store in multiple invocations of the sqlf upgrade-disk-store command. SQLFire creates default disk stores in each SQLFire server or locator directory, as well as in the /datadictionary subdirectory of each SQLFire server or locator directory. The example below shows how to upgrade these default disk stores on a single member; you must repeat the commands on each machine that hosts a locator or server. By default, SQLFire disk store files are owned by the user sqlfire, and you must preserve the ownership of these files when upgrading the disk store. Use the sudo command to perform the upgrade commands as the sqlfire user, as in these example commands: sudo -u sqlfire sqlf upgrade-disk-store SQLF-DEFAULT-DISKSTORE /var/opt/vmware/sqlfire/server sudo -u sqlfire sqlf upgrade-disk-store SQLF-DD-DISKSTORE /var/opt/vmware/sqlfire/server/datadictionary Replace /var/opt/vmware/sqlfire/server with the actual server or locator directory. 5. Repeat the previous step to upgrade each default and user-defined disk store on the local system. 6. Redeploy your environment's license or configuration files to the new version's installation. Copy and edit the files as necessary to point to the new installation directory. 7. Restart all system members according to your usual procedures. 8. Beginning with version 1.1.1, SQLFire stores normalized (all-uppercase) user names. The normalized user names are also used for password encryption. If you configured users in the previous release and the user names were not in all-uppercase letters, also complete these steps: a. Use the sqlf encrypt-password command to regenerate encrypted passwords for any system users that were defined in the previous software release using mixed-case characters. b. Use SYS.CHANGE_PASSWORD to regenerate passwords for all other users that were created using mixed-case characters. What to Do Next After you have upgraded, perform the following tasks: • Run the product tutorial and examples. See Tutorials on page 43. • Test your development systems with the new version.
38
vFabric SQLFire User's Guide
Upgrading vFabric SQLFire
Version Compatibility Rules Upgraded SQLFire members cannot participate in a distributed system that contains SQLFire members from an earlier version. Shut down all existing cluster members before restarting the cluster with upgraded SQLFire members. Thin clients from earlier versions of SQLFire can operate against the latest version of SQLFire. However, thin clients using the latest SQLFire JDBC drivers are not compatible with older versions of SQLFire. Refer to the Release Notes to learn about changes that were introduced in the latest update. Note: Beginning with version 1.1.1, SQLFire stores normalized (all-uppercase) user names. The normalized user names are also used for password encryption. Any hashed passwords that were generated using the sqlf encrypt-password command for system users (defined in sqlfire.properties or otherwise) that are not in all upper-case characters must be regenerated. Similarly, all users that were created using the SYS.CREATE_USER procedure must regenerate passwords using SYS_CHANGE_PASSWORD, if the user name was previously not in all upper-case characters.
39
Chapter
6
Connect to vFabric SQLFire with JDBC Tools Third-party JDBC tools can help you browse data in tables, issue SQL commands, design new tables, and so forth. You can configure these tools to use the SQLFire JDBC thin client driver to connect to a SQLFire distributed system. Although the instructions for setting up each tool vary, the general process for establishing a connection involves configuring the JDBC client driver and setting JDBC connection URL properties. Follow these basic steps: 1. In the third-party tool, choose to configure a new driver, and select the sqlfireclient.jar file that contains the SQLFire JDBC client driver. This file is installed in the lib directory of your SQLFire installation. 2. If the tool does not automatically select a driver class, you will generally have the option of selecting a class from within the JAR file. For SQLFire, select the com.vmware.sqlfire.jdbc.ClientDriver class. 3. In order to use the client driver, you must specify a JDBC connection URL for your SQLFire distributed system. The basic URL format for the client driver is: jdbc:sqlfire://hostname:port/ where hostname and port correspond to the -client-bind-address and -client-port value of a SQLFire server or locator in your distributed system. 4. Your tool may require that you specify a username and password to use for connecting to the system. If the SQLFire server or locator enables authentication (using the -auth-provider boot property), then enter a valid username and password combination to connect to the distributed system. If authentication is disabled, specify "app" as both the username and password values, or any other temporary value. Note: SQLFire uses the username specified in the JDBC connection as the schema name when you do not provide the schema name for a database object. SQLFire uses "APP" as the default schema. If your system does not enable authentication, you can specify "APP" for both the username and password to maintain consistency with the default schema behavior. For a full example of configuring SQLFire with a third-party JDBC tool, see Connecting to VMware vFabric SQLFire using SQuirreL SQL on the SQLFire community site.
41
Chapter
7
Tutorials Learn to configure and use SQLFire features such as table replication and partitioning, persisting data to disk, and dynamically resizing the cluster.
Main Steps The tutorial is divided into the following steps, which explain how to set up a cluster of SQLFire servers on multiple Java VMs and then distribute data across the cluster. Perform the steps in the sequence shown. Step
Description
Step 1
Create a SQLFire Cluster on page 43
Step 2
Connect to the Cluster Using SQLF on page 45
Step 3
Create Replicated Tables and Execute Queries on page 46
Step 4
Implement a Partitioning Strategy on page 47
Step 5
Persist Tables to Disk on page 49
Step 6
Add Servers to the Cluster and Stop Servers on page 51
Prerequisites Install SQLFire on your local computer as described in Install vFabric SQLFire from a ZIP File on page 25.
Create a SQLFire Cluster In this procedure you set up and start a cluster of two SQLFire servers. Procedure 1. Begin by opening a command prompt or terminal window on the computer or VM where you installed SQLFire. 2. Move to the directory in which you installed the SQLFire software. This tutorial uses the example directory ~/vFabric_SQLFire_11_b NNNNN, but you should substitute the actual path to your installation. For example: cd ~/vFabric_SQLFire_11_bNNNNN 3. The initial SQLFire cluster in this tutorial contains two standalone SQLFire server members. Create a new directory for each server: mkdir server1 mkdir server2 Each server will use its local directory to write log files, backup disk store files, a datadictionary directory for persisting data, and a single status file, .sqlfserver.ser.
43
Getting Started with vFabric SQLFire
4. To manage client connections to the available SQLFire server members, the cluster in this tutorial uses a SQLFire locator member. Create a new directory for the locator member: mkdir locator A locator maintains a list of available servers in the cluster, and updates that list as servers join and leave the cluster. Locators also load balance client connections across all available servers. 5. To start or stop a SQLFire locator or a SQLFire server, you use the sqlf script (for Linux platforms) or sqlf.bat script (for Windows platforms). In either case, you must first ensure that the path to the SQLFire bin directory is part of your PATH environment variable. For example, on a Linux platform enter: export PATH=$PATH:~/vFabric_SQLFire_11_bNNNNN/bin On a Windows platform enter: set Path=%Path%;c:\vFabric_SQLFire_11_bNNNNN\bin 6. When you start a SQLFire distributed system, always begin by starting the locator member. Use the sqlf locator command to start the locator in the specified directory: sqlf locator start -dir=locator -peer-discovery-address=ip_address -peer-discovery-port=10101 \ -client-bind-address=ip_address -client-port=1527 Note: In this step and in all subsequent steps, replace ip_address with the IP address of your local system. The -peer-discovery-address and -peer-discovery-port combination defines a unique connection that all members of this SQLFire distributed system use for communicating with one another. Note: Always use a unique -peer-discovery-port number to avoid joining a cluster that is already running on your network. If other people might be evaluating SQLFire on your network, choose a port number other than 10101. The -client-bind-address and -client-port combination defines the connection that client applications use to connect to this locator. In this tutorial, all SQLFire members run on the local computer's IP address and use different port numbers to define unique connections. You can verify the peer and client connections in the locator startup messages, which are similar to: Starting SQLFire Locator using peer discovery on: ip_address[10101] Starting network server for SQLFire Locator at address localhost/127.0.0.1[1527] SQLFire Locator pid: 41149 status: running Note: By starting the locator member first, the locator can manage cluster membership from the start as new servers join and leave the distributed system. The locator member should also be the last process that you shut down when you want to stop a SQLFire distributed system. Note: As an alternative, SQLFire members can discover each other using multicast messaging. However, important SQLFire features such as WAN replication and user authentication require that a SQLFire system use locators rather than multicast for discovery. See Configuring Discovery Mechanisms on page 229. 7. Now use the sqlf server command to start both SQLFire server members and join them to the distributed system: sqlf server start -dir=server1 -locators=ip_address[10101] -client-bind-address=ip_address -client-port=1528 44
vFabric SQLFire User's Guide
Tutorials
sqlf server start -dir=server2 -locators=ip_address[10101] -client-bind-address=ip_address -client-port=1529 In each command, the -locators option defines the peer discovery address to use for joining the SQLFire distributed system. (Production deployments generally use multiple locator members, in which case you would specify a comma-separated list of locator host[port] connections when starting a server.) Again, the combination of -client-bind-address and -client-port indicates that each server will listen for thin clients on a unique connection (ip_address:1528 and ip_address:1529, respectively). However, in this distributed system all clients will connect using the locator instead of making direct connections to servers. 8. Both SQLFire servers output messages similar to: Starting SQLFire Server using locators for peer discovery: ip_address[10101] Starting network server for SQLFire Server at address /ip_address[1528] SQLFire Server pid: 41502 status: running Distributed system now has 2 members. Other members: localhost(41149:locator):32977/50114 Logs generated in /Users/yozie/vFabric_SQLFire_11_bNNNNN/server1/sqlfserver.log Starting SQLFire Server using locators for peer discovery: ip_address[10101] Starting network server for SQLFire Server at address /ip_address[1529] SQLFire Server pid: 41533 status: running Distributed system now has 3 members. Other members: localhost(41149:locator):32977/50114, 10.0.1.11(41502:datastore):49915/50462 Logs generated in /Users/yozie/vFabric_SQLFire_11_bNNNNN/server2/sqlfserver.log Startup messages show the cluster membership details. By default, new SQLFire servers host data as data stores, and are automatically added to a default server group. You can optionally specify server group membership at startup using the server-groups boot property.
Connect to the Cluster Using SQLF sqlf implements a command-line tool that is based on the Apache Derby ij tool. You can use sqlf to connect to a SQLFire cluster and run scripts or interactive queries. You execute sqlf using the sqlf or sqlf.bat script. Procedure 1. In the same command prompt or terminal Window in which you started the SQLFire cluster, change to the quickstart directory: cd quickstart The quickstart directory contains example SQL script files that you will use later in the tutorial. 2. Start an interactive sqlf session: sqlf This command starts the interactive shell and displays the prompt: sqlf>. 3. Print a brief list of the available sqlf commands: help;
45
Getting Started with vFabric SQLFire
4. To connect to the SQLFire cluster using the JDBC thin client driver, use the connect client command and specify the host and port number of the SQLFire locator: connect client 'ip_address:1527'; Notice that SQLFire does not have the concept of a "database". When you connect to a SQLFire cluster, the distributed system that you connect to is defined by the locator (or alternately, the mcast-port) specified in the JDBC or ADO.NET connection. 5. Use the following command to view tables in the "sys" schema: show tables in sys; You will use information from many of these tables to view information and statistics about a working SQLFire cluster. 6. The sys.members table stores information about the SQLFire members. Execute the following query to see the unique ID assigned to the two SQLFire servers and the locator that you started: select id from sys.members; You will see output similar to: ID ------------------------------0.0.1.11(41533):52614/50508 localhost(41149):32977/50114 10.0.1.11(41502):49915/50462 3 rows selected The output displays the member process ID that is also logged when you start the members at the command line (41149, 41502, and 41533 in the tutorial example).
Create Replicated Tables and Execute Queries By default SQLFire replicates tables to members of the cluster. In this procedure you create new tables that are replicated the SQLFire cluster. Procedure 1. In the same sqlf session, run the ToursDB_schema.sql script to create the tables associated with the ToursDB sample database: run 'ToursDB_schema.sql'; You see DDL output such as: sqlf> CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION, BUSINESS_LEVEL_FACTOR DOUBLE PRECISION, FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION, ECONOMY_SEATS INTEGER, BUSINESS_SEATS INTEGER, FIRSTCLASS_SEATS INTEGER );
46
vFabric SQLFire User's Guide
Tutorials
0 rows inserted/updated/deleted [...] 2. Run the loadTables.sql script to populate the tables with data: run 'loadTables.sql'; The script output completes with: sqlf> insert into FLIGHTAVAILABILITY values ('US1357',2,'2004-04-18',0,0,3); 1 row inserted/updated/deleted 3. Enter the following command to show the table names that you created (tables in the APP schema): show tables in APP; 4. The new tables that you create and the data that you load are replicated on the two SQLFire servers by default. You can check whether tables are partitioned or replicated by querying information in sys.systables. Use the following query to check the data policy that SQLFire has assigned to the tables you just created: select tablename, datapolicy from sys.systables where tableschemaname='APP'; The output shows that each of the ToursDB tables that you created are replicated. SQLFire replicates tables by default if you do not use the PARTITION BY clause in the CREATE TABLE statement. Partitioning Tables on page 67 and Replicating Tables on page 77 provide more information about creating tables in SQLFire. 5. You can also use the sys.members table to determine which members host a particular table: select id from sys.systables s, sys.members where s.tablename='FLIGHTS'; In the tutorial, both of the SQLFire servers store information for each replicated table. 6. SQLFire provides query features similar to those available in other data management products. For example, the following command executes a simple query: SELECT city_name, country, language FROM cities WHERE language LIKE '%ese'; The following query executes a join between tables: SELECT city_name, countries.country, region, language FROM cities, countries WHERE cities.country_iso_code = countries.country_iso_code AND language LIKE '%ese';
Implement a Partitioning Strategy In this procedure you drop all tables in the ToursDB schema and then recreate them using a new partitioning and replication strategy. The ToursDB schema in this tutorial is similar to a 'STAR' schema, having only a few fact tables and several dimension tables. Dimension tables are generally small and change infrequently, but are commonly used in join queries. Dimension tables are good candidates for replicating across SQLFire members, because join queries can execute in parallel. The AIRLINES, CITIES, COUNTRIES, and MAPS tables are treated as dimension tables, and are replicated across the SQLFire cluster. In the tutorial it is assumed that applications frequently join these related tables based on the FLIGHT_ID column, which is chosen as the partitioning column.
47
Getting Started with vFabric SQLFire
FLIGHTS, FLIGHTS_HISTORY, and FLIGHTAVAILABILITY are fact tables, and they will be partitioned. You will co-locate these tables to ensure that all rows that are associated with FLIGHT_ID are maintained in a single partition. This step ensures that frequent join queries based on a selected flight are pruned to a single member and executed efficiently. Procedure 1. In a separate terminal window or GUI editor, open the create_colocated_schema.sql file in the vFabric_SQLFire_11_bNNNNN/quickstart directory to examine the included DDL commands. The SQL script begins by dropping the existing tables in the schema: DROP DROP DROP DROP DROP DROP DROP
TABLE TABLE TABLE TABLE TABLE TABLE TABLE
AIRLINES; CITIES; COUNTRIES; FLIGHTAVAILABILITY; FLIGHTS; MAPS; FLIGHTS_HISTORY;
Dimension tables can be replicated using the same basic CREATE statement from the preceding section of the tutorial. However, this script explicitly adds the REPLICATE keyword for clarity. For example: CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION, BUSINESS_LEVEL_FACTOR DOUBLE PRECISION, FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION, ECONOMY_SEATS INTEGER, BUSINESS_SEATS INTEGER, FIRSTCLASS_SEATS INTEGER ) REPLICATE; The FLIGHTS table is partitioned based on the FLIGHT_ID column: CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3), ARRIVE_TIME TIME, MEAL CHAR(1), FLYING_TIME DOUBLE PRECISION, MILES INTEGER, AIRCRAFT VARCHAR(6), CONSTRAINT FLIGHTS_PK PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER), CONSTRAINT MEAL_CONSTRAINT CHECK (meal IN ('B', 'L', 'D', 'S')) ) PARTITION BY COLUMN (FLIGHT_ID); The remaining facts tables are also partitioned, and also colocated with the FLIGHTS table. For example: CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL ,
48
vFabric SQLFire User's Guide
Tutorials
SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, BUSINESS_SEATS_TAKEN INTEGER DEFAULT 0, FIRSTCLASS_SEATS_TAKEN INTEGER DEFAULT 0, CONSTRAINT FLIGHTAVAIL_PK PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE), CONSTRAINT FLIGHTS_FK2 Foreign Key ( FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER) ) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS); 2. In the sqlf session, execute the create_colocated_schema.sql script to drop the existing tables and recreate them with the new partitioning and replication strategy. Execute loadTables.sql to populate the tables with data: run 'create_colocated_schema.sql'; run 'loadTables.sql'; 3. Confirm that the tables are created: show tables in APP; 4. Verify whether individual tables are replicated or partitioned: select tablename, datapolicy from sys.systables where tableschemaname='APP'; 5. The FLIGHTS table and others are now partitioned across the SQLFire cluster. Query the sys.members table again to display the member IDs that host the table: select id from sys.systables s, sys.members where s.tablename='FLIGHTS'; 6. Now use the DSID Function to see how many rows of the partitioned FLIGHT table are stored on that SQLFire server. For example: select count(*) memberRowCount, dsid() from flights group by dsid(); 7. Execute a join on both partition members in parallel. select * from flights f, flightavailability fa where f.flight_id = fa.flight_id and f.flight_id = 'AA1116'; The combined results are returned. Because the table is partitioned by FLIGHT_ID, the execution of the join is pruned to the partition that stores the value 'AA1116.' You can verify that flight_id 'AA1116' is located on only one data store using the query: select count(*), dsid() from flights where flight_id = 'AA1116';
Persist Tables to Disk By default, a SQLFire distributed system persists only the data dictionary for the tables and indexes you create. These persistence files are stored in the datadictionary subdirectory of each locator and data store that joins the distributed system. Table data, however, is not persisted by default; if you shut down al SQLFire members, the tables are empty on the next startup. In this procedure you will persist table data to disk. 49
Getting Started with vFabric SQLFire
Procedure 1. In a separate terminal window or GUI editor, examine the contents of the create_persistent_schema.sql script. Notice that this script uses the PERSISTENT keyword in each CREATE TABLE statement. For example: CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL CONSTRAINT COUNTRIES_UNQ_NM Unique, COUNTRY_ISO_CODE CHAR(2) NOT NULL CONSTRAINT COUNTRIES_PK PRIMARY KEY, REGION VARCHAR(26), CONSTRAINT COUNTRIES_UC CHECK (country_ISO_code = upper(country_ISO_code) ) ) REPLICATE PERSISTENT; 2. In the sqlf session, execute the create_persistent_schema.sql script, then load table data: run 'create_persistent_schema.sql'; run 'loadTables.sql'; 3. Exit the sqlf session: exit; 4. Now list the contents of each SQLFire server directory: ls -l ../server1 ../server2 Notice that each SQLFire directory now contains several disk store files to manage the table data on disk. For example: BACKUPSQLF-DEFAULT-DISKSTORE.if BACKUPSQLF-DEFAULT-DISKSTORE_1.crf BACKUPSQLF-DEFAULT-DISKSTORE_1.drf 5. Because the data is persisted to disk, SQLFire will recover data from disk even if you shut down the data stores. Use the shut-down-all command to shut down all data stores in the cluster, and then restart the servers: cd ~/vFabric_SQLFire_11_bNNNNN sqlf shut-down-all -locators=ip_address[10101] Connecting to distributed system: locators=ip_address[10101] Successfully shut down 2 members sqlf server start -dir=server1 -locators=ip_address[10101] -client-bind-address=ip_address -client-port=1528 -sync=false sqlf server start -dir=server2 -locators=ip_address[10101] -client-bind-address=ip_address -client-port=1529 -sync=false Note: By default, SQLFire servers and locators start to a "waiting" state if the member depends on another server or locator for disk store synchronization. The server automatically continues booting after any dependent servers or locators start. If you start servers from within a shell script, you can use the sqlf locator on page 391 or sqlf server on page 405 command to wait for members to reach the running state before continuing. As an alternative, you can monitor the current status of SQLFire members using STATUS column in the MEMBERS system table. When you restart the servers, you may notice messages similar to: [info 2012/07/24 10:18:11.949 PDT tid=0x1] Region /_DDL_STMTS_META_REGION initialized with data from
50
vFabric SQLFire User's Guide
Tutorials
/10.118.33.206:/Users/yozie/vFabric_SQLFire_11_bNNNNN/server1/./datadictionary created at timestamp 1343149012698 version 0 diskStoreId 1fc6a853-69d6-4ffe-8029-218acf165c34 is waiting for the data previously hosted at [/10.118.33.206:/Users/yozie/vFabric_SQLFire_11_bNNNNN /server2/./datadictionary created at timestamp 1343149035681 version 0 diskStoreId 49c1e827-4bb5-49cc-951b-221d47bbc92f] to be available These are not error messages. They indicate that the SQLFire member you are starting is waiting for another member to become available online. 6. Now verify that the persistent tables were reloaded: sqlf connect client 'ip_address:1527'; select id from sys.systables s, sys.members where s.tablename='FLIGHTS'; select count(*) memberRowCount, dsid() from flights group by dsid();
Add Servers to the Cluster and Stop Servers SQLFire manages data in a flexible way that enables you to expand or contract your cluster at runtime to support different loads. To dynamically add more capacity to a cluster, you add new server members and specify the -rebalance option. Procedure 1. Open a new terminal or command prompt window, and create a directory for the new server. Also set your PATH in the new terminal if necessary: cd ~/vFabric_SQLFire_11_bNNNNN mkdir server3 export PATH=$PATH:~/vFabric_SQLFire_11_bNNNNN/bin 2. Start the new server with a unique port number and specify the -rebalance option: sqlf server start -dir=server3 -locators=ip_address[10101] -client-bind-address=ip_address -client-port=1530 -rebalance 3. View the contents of the new SQLFire directory: ls server3 Notice that the new SQLFire server also persists the available table data, and contains the same disk store files as the other two servers. 4. You can view all members of the distributed system using sqlf. In the available sqlf session, execute the query: select id from sys.members; 5. Verify that all servers now host the data: select id from sys.systables s, sys.members where s.tablename='FLIGHTS'; 6. Examine the table data that each server hosts: select count(*) memberRowCount, dsid() from flights group by dsid(); 7. Exit the sqlf session: exit;
51
Getting Started with vFabric SQLFire
8. You can stop an individual SQLFire server by using the sqlf server stop command and specifying the server directory. To shut down all data stores at once, use the sqlf shut-down-all command: sqlf shut-down-all -locators=ip_address[10101] Connecting to distributed system: locators=ip_address[10101] Successfully shut down 3 members 9. After all data stores have stopped, shut down the locator as well: sqlf locator stop -dir=locator1
Perform Additional Tasks After you complete the SQLFire tutorial, you can perform related tutorial tasks to explore other areas of the product. Explore the toursDB Database The example toursDB database manages information about air transportation for scheduling flights. If you need quick access to a database to use when trying SQLFire features, toursDB DDL and data script sare available in the quickstart directory. To view the toursDB schema, see ToursDB sample database Explore SQLFire Language Extensions The tutorial introduces the basics of partitioning, replication, and perisistence using simple modifications to the toursDB DDL. See SQL Statements on page 486 to begin learning about other features that are implemented in SQLFire DDL. You can copy and modify the create_colocated_schema.sql as necessary to implement new features.
52
vFabric SQLFire User's Guide
Chapter
8
vFabric SQLFire Features and Benefits The sections that follow summarize main features. The SQLFire community site provides additional information about the features and benefits of SQLFire and a comparison of SQLFire to other data management systems. See also the SQLFire product page. In-Memory Data Management with Optimized Disk Persistence SQLFire enables applications to manage data entirely in memory by using partitioning and synchronous replication to distribute the data across numerous SQLFire members. SQLFire also provides an optimized disk persistence mechanism with a non-flushing algorithm to maintain high performance in applications that require stable, long-term storage. Applications can also use SQLFire to actively cache table data from a traditional disk-based RDBMS. Continuous Availability, Elastically Scaled, Low Latency A flexible architecture enables SQLFire to pool memory and disk resources from hundreds of clustered members. This clustered approach provides extremely high throughput, predictable latency, dynamic and linear scalability, and continuous availability of data. By collocating application logic with data and executing application logic in parallel, SQLFire substantially increases application throughput. It also transparently re-executes application logic if a server fails. Highly Adaptable to Existing Applications SQLFire is implemented entirely in Java, and it can be embedded directly within a Java application. You can also deploy SQLFire members as standalone servers that participate in a cluster. Java applications can connect to a SQLFire cluster using the provided JDBC drivers. Microsoft .NET and Mono applications can connect using the provided ADO.NET driver. The use of JDBC, ADO.NET, and SQL means that many existing database applications can be easily adapted to use a SQLFire cluster. SQLFire introduces several extensions to common SQL Data Definition Language (DDL) statements to manage data partitioning, replication, synchronization with data sources, and other features. However, most common queries and Data Manipulation Language (DML) statements are based on ANSI SQL-92, so experienced database application developers can use their knowledge of SQL when working with SQLFire.
53
Part
4
Managing Your Data in vFabric SQLFire Managing Your Data in vFabric SQLFire describes how to design your database schema to take advantage of replication and partitioning in a distributed SQLFire cluster. You can use this information to design new databases, or to adapt an existing database for use with SQLFire. This guide also describes how to persist your data to disk store files, and it provides a simple procedure for importing an existing database into SQLFire.
Topics: • • • • • • • •
Designing vFabric SQLFire Databases Using Server Groups to Manage Data Partitioning Tables Replicating Tables Estimating Memory Requirements Using Disk Stores to Persist Data Exporting and Importing Data with vFabric SQLFire Using Table Functions to Import Data as a SQLFire Tables
55
Chapter
9
Designing vFabric SQLFire Databases Unlike in traditional database design, SQLFire requires the designer to analyze the common access patterns and choose a partitioning strategy that results in queries that are pruned to a single partition. In addition, fact tables may need to be replicated in order to make data available for join operations against partitioned data. This chapter describes the basic table design principles that you can use to achieve linear scaling with SQLFire.
Design Principles of Scalable, Partition-Aware Databases A key design principle for linear scaling is a partitioning strategy that allows most data access (queries) to be pruned to a single partition, thus avoiding expensive locking or latching across multiple partitions during query execution. In a highly concurrent system having thousands of connections, multiple queries are uniformly spread across the entire data set (and therefore across all partitions), and increasing the number of data stores enables linear scalability. Given sufficient network performance, more connections can be supported without degrading the response time. Note: SQLFire supports distributed queries by parallelizing the query execution across data stores. However, each query instance on a partition can only join rows that are collocated with the partitioned data. This means that queries can join rows between a partitioned table and any number of replicated tables hosted on the data store with no restrictions. But queries that join multiple, partitioned tables have to be filtered based on the partitioning key. Query examples are provided in this section and in Query Capabilities and Limitations on page 768.
Identify Entity Groups and Partitioning Keys In relational database terms, an entity group corresponds to rows that are related to one another through foreign key relationships. Members of an entity group are typically related by parent-child relationships and can be managed in a single partition. To design a SQLFire database for data partitioning, begin by identifying "entity groups" and their associated partitioning keys. For example: • In a customer order management system, most transactions operate on data related to a single customer at a time. Queries frequently join a customer's billing information with their orders and shipping information. For this type of application, you partition related tables using the customer identity. Any customer row along with their "order" and "shipping" rows forms a single entity group having the customer ID as the entity group identity (partitioning key). Partitioning related tables using the customer identity enables you to scale the system linearly as you add more members to support additional customers. • In a system that manages a comprehensive product catalog (product categories, product specifications, customer reviews, rebates, related products, and so forth) most data access focuses on a single product at a time. In such a system, you would partition your data on the product key. 57
Managing Your Data in vFabric SQLFire
• In an online auction application, you may need to stream incoming auction bids to hundreds of clients with very low latency. To do so, you must manage selected "hot" auctions on a single partition so that they receive sufficient processing power. As the processing demand increases, add more partitions and route the application logic that matches bids to clients to the data store itself. • In a financial trading engine that constantly matches bid prices to asking prices for thousands of securities, partition data using ID of the security. When market data for a security changes, all of the related reference data is co-located with the matching algorithm to ensure low-latency execution. Life beyond Distributed Transactions provides additional background information about entity groups and distributed systems. Creating Partitioned Tables on page 70 describes other ways to partition a table in SQLFire. Use Entity Groups Instead of "Sharding" Sharding is a common strategy in traditional databases where database rows are horizontally partitioned into multiple "database shards." There are a number of differences between application-assisted or system-assisted sharding and SQLFire partition-aware database design: • Rebalancing: SQLFire rebalances the data automatically by making sure that related rows are migrated together and without any integrity loss. This enables you to add capacity as needed. • Distributed transactions: SQLFire transaction boundaries are not limited to a single partition. Atomicity and isolation guarantees are provided across the entire distributed system. • Parallel scatter-gather: Queries that cannot be pruned to a single partition are automatically executed in parallel on data stores. Joins can be performed between tables with the restriction that the joined rows are in fact collocated. • Subqueries on remote partitions: Even when a query is pruned to a single partition, the query can execute subqueries that operate on data that is stored on remote partitions.
Replicate Code Tables The "star" schema is the most common relational design pattern. In a star schema, large "fact" tables have foreign key relationships to one or more "code" or "dimension" tables. With SQLFire, you should denormalize code tables into fact tables when possible. When this is not possible, determine whether code tables can be replicated in the SQLFire cluster. The main goal is to partition the entities in fact tables, but to replicate the rows in slow-changing code tables on each member where the partitions are managed. In this way, a join between the fact table and any number of its code tables can be executed concurrently on each partition, without requiring network hops to other members. Replicating Tables on page 77 provides more information.
Dealing with Many-to-Many Relationships Where tables have many-to-many relationships, you have a choice of strategies for handling queries that need to join non-colocated data. Note: Joins are permitted only on data that is colocated. Query execution can be distributed and executed in parallel, but the joined rows in each partition member have to be restricted to other rows in the same partition. For tables have many-to-many relationships, choose one of the following strategies to handle queries that need to join non-colocated data:
58
vFabric SQLFire User's Guide
Designing vFabric SQLFire Databases
• Use parallel, data-aware procedures to run the logic for the query on the member that stores some or all of the data (to minimize data distribution hops). The procedure should execute multiple queries and then join the results using application code. Using Data-Aware Stored Procedures on page 163 provides more information. • Split the query into multiple queries, and perform the join in the application client code.
Example: Adapting a Database Schema for SQLFire If you have an existing database design that you want to deploy to SQLFire, translate the entity-relationship model into a physical design that is optimized for SQLFire design principles. The goal is to identify tables to partition or replicate in the SQLFire cluster, and determine the partitioning key(s) for partitioned tables. This generally requires an iterative process to produce the most optimal design. Procedure 1. Read Guidelines for Adapting a Database to SQLFire on page 59and the preceding sections under Designing vFabric SQLFire Databases on page 57. 2. Evaluate data access patterns to define entity groups that are candidates for partitioning. See Step 1: Determine the entity groups on page 60. 3. Identify all tables in the entity groups. See Step 2: Identify the tables in each entity group on page 60. 4. Identify the "partitioning key" for each partitioned table. The partitioning key is the column or set of columns that are common across a set of related tables. See Step 3: Define the partitioning key for each group on page 61. 5. Identify the tables that are candidates for replication. You can replicate table data for high availability, or to co-locate table data that is necessary to execute joins. See Step 4: Identify replicated tables on page 61. Guidelines for Adapting a Database to SQLFire Follow these guidelines for designing a SQLFire database or adapting an existing database to SQLFire: • Focus your efforts on commonly-joined entities. Remember that all join queries must be performed on data that is co-located. In this release, SQLFire only supports joins where the data is co-located. Co-located data is also important for transaction updates, because the transaction can execute without requiring distributed locks in a multi-phase commit protocol. • After you locate commonly-joined entities, look for parent-child relationships in the joined tables. The primary key of a root entity is generally also the best choice for partitioning key. • Understand the trade-offs associated with different partitioning and co-location strategies. The steps that follow describe how to evaluate a customer order management system. This example shows tables from the Microsoft Northwind Traders sample database.
59
Managing Your Data in vFabric SQLFire
Step 1: Determine the entity groups Entity groups are generally course-grained entities that have children, grand children, and so forth, and they are commonly used in queries. This example chooses these entity groups: Entity group
Description
Customer
This group uses the customer identity along with orders and order details as the children.
Product
This group uses product details along with the associated supplier information.
Step 2: Identify the tables in each entity group Identify the tables that belong to each entity group. In this example, entity groups use the following tables. Entity group Customer
Tables Customers Orders Shippers
60
vFabric SQLFire User's Guide
Designing vFabric SQLFire Databases
Entity group
Tables Order Details
Product
Product Suppliers Category
Step 3: Define the partitioning key for each group In this example, the partitioning keys are: Entity group
Partitioning key
Customer
CustomerID
Product
ProductID
This example uses customerID as the partitioning key for the Customer group. The customer row and all associated orders will be collocated into a single partition. To explicitly colocate Orders with its parent customer row, use the colocate with clause in the create table statement: create table orders () partition by (customerID) colocate with (customers); Create the OrderDetails table in a similar fashion. In this way, SQLFire supports any queries that join any of Customer, Orders, and OrderDetails. This join query would be distributed to all partitions and executed in parallel, with the results streamed back to the client: select * from customer c , orders o where c.customerID = o.customerID; A query such as this would be pruned to the single partition that stores "customer100" and executed only on that SQLFire member: select * from customer c, orders o where c.customerID = o.customerID and c.customerID = 'customer100'; The optimization provided when queries are highly selective comes from engaging the query processor and indexing on a single member rather than on all partitions. With all customer data managed in memory, query response times are very fast. Consider how the above query would execute if the primary key was not used to partition the table. In this case, the query would be routed to each partition member where an index lookup would be performed, even though only a single member might have any data associated with the query. Finally, consider a case where an application needs to access customer order data for several customers: select * from customer c, orders o where c.customerID = o.customerID and c.customerID IN ('cust1', 'cust2', 'cust3'); Here, SQLFire prunes the query execution to only those partitions that host 'cust1', 'cust2', and 'cust3'. The union of the results is then returned to the caller. Step 4: Identify replicated tables If we assume that the number of categories and suppliers rarely changes, those tables can be replicated in the SQLFire cluster (replicated to all of the SQLFire members that host the entity group). If we assume that the Products table does change often and can be relatively large in size, then partitioning is a better strategy for that table.
61
Managing Your Data in vFabric SQLFire
So for the product entity group, table Products is partitioned by ProductID, and the Suppliers and Categories tables are replicated to all of the members where Products is partitioned. Applications can now join Products, Suppliers and categories. For example: select * from Products p , Suppliers s, Categories c where c.categoryID = p.categoryID and p.supplierID = s.supplierID and p.productID IN ('someProductKey1', ' someProductKey2', ' someProductKey3'); In the above query, SQLFire prunes the query execution to only those partitions that host 'someProductKey1', ' someProductKey2', and ' someProductKey3.'
62
vFabric SQLFire User's Guide
Chapter
10
Using Server Groups to Manage Data Use server groups to control where table data is stored.
Server Groups Overview A server group specifies the SQLFire members that will host data for a table. You use a server group to logically group SQLFire data stores for managing a table's data. Any number of SQLFire members that host data can participate in one or more server groups. You specify named server groups when you start a SQLFire data store.
63
Managing Your Data in vFabric SQLFire
A SQLFire member that is booted with host-data=false is an accessor and does not host table data, even if you specify one or more server groups. However, peer clients that host data can also participate in server groups.
64
vFabric SQLFire User's Guide
Using Server Groups to Manage Data
By default, all servers that host data are added to the "default" server group. Different logical database schema are often managed in different server groups. For example, an order management system might manage all customers and their orders in an "Orders" schema deployed to one server group. The same system might manage shipping and logistics data in a different server group. A single peer or server can participate in multiple server groups, typically as a way to colocate related data or to control the number of redundant copies in replicated tables. With support for dynamic group membership, the number of processes hosting data for a server group can change dynamically. However, this dynamic aspect of server group membership is abstracted away from the application developer, who can look at a server group as a single logical server. Server groups only determine those peers and servers where a table's data is being managed. Tables are always accessible for any peer member of the distributed system, and from thin clients that connect to a single server. When you invoke server side procedures, you can parallelize the execution of the procedure on all members in the server group. These data-aware procedures also execute on any peer clients that belong to the server groups. Without associating tables to specific member IP addresses, the capacity of a server group can be dynamically increased or decreased without any impact to existing servers or client applications. SQLFire can automatically rebalance the tables in the server group to the newly added members.
Adding Members to Server Groups You define server group membership and/or create a server group when you start a SQLFire member using the server-groups boot property.
65
Managing Your Data in vFabric SQLFire
For example, if you start a SQLFire server from the command line with sqlf, use the server-groups property to specify the names of one or more server groups that the server should join: sqlf server start -server-groups=OrdersDB,OrdersReplicationGrp,DBProcessInstance1 In this example, the SQLFire server participates in three server groups: OrdersDB, OrdersReplicationGrp and DBProcessInstance1. If this is the first SQLFire member to define a named server group, then SQLFire creates the group and adds the new member after startup. If other SQLFire members in the cluster were booted using the same server group, then SQLFire adds this new member to the existing group. If you specify no -server-groups property, then SQLFire automatically adds the data store member to the default server group. If you start a SQLFire peer client from within a Java application, specify the server-groups property as part of the JDBC peer client connection string. For example, use the connection URL: jdbc:sqlfire:;mcast-port=33666;host-data=true;server-groups=OrdersDB,OrdersReplicationGrp,DBProcessInstance1 Starting and Configuring SQLFire Servers on page 233 and Starting SQLFire Servers with the FabricServer Interface on page 119 provide more information about specifying boot properties.
Assigning Tables to Server Groups When you create a new table, the CREATE TABLE statement can specify the server group to which the table belongs. A partitioned table is spread across all of the members of the specified server group. A replicated table is replicated to all members of the server group. See Replicating Tables on page 77. For example, the following command creates a replicated table on two server groups: CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL, COUNTRY_ISO_CODE CHAR(2) NOT PRIMARY KEY, REGION VARCHAR(26), ) SERVER GROUPS (OrdersDB, OrdersReplicationGrp) Tables in SQLFire are replicated by default if you do not specify partitioning. If you do not specify one or more server group names, tables are partitioned or replicated across all members of the default server group for that schema. This behavior may not be desirable in all situations. For example, if the data in a replicated table changes frequently, the cost of maintaining a copy on each server in the default group may be prohibitive. In this case, the application developer or system administrator can have several members participate in a new, smaller server group to limit the number of replicas. When two tables are partitioned and colocated, it forces partitions that have the same values for those columns in the two tables to be located on the same member. Colocated tables must belong to at least one common server group. As a best practice, you deploy colocated tables on exactly the same server groups. See Partitioning Tables on page 67.
66
vFabric SQLFire User's Guide
Chapter
11
Partitioning Tables Horizontal partitioning involves spreading a large data set (many rows in a table) across members in a cluster. SQLFire uses a variant of the consistent hash algorithm to help ensure that data is uniformly balanced across all members of the target server group.
How Table Partitioning Works You specify the partitioning strategy of a table in the PARTITION BY clause of the CREATE TABLE statement. The available strategies include hash-partitioning on each row's primary key value, hash-partitioning on column values other than the primary key, range-partitioning, and list-partitioning. SQLFire maps each row of a partitioned table to a logical "bucket." The mapping of rows to buckets is based on the partitioning strategy that you specify. For example, with hash-partitioning on the primary key, SQLFire determines the logical bucket by hashing the primary key of the table. Each bucket is assigned to one or more members, depending on the number of copies that you configure for the table. Configuring a partitioned table with one or more redundant copies of data ensures that partitioned data remains available even if a member is lost. When members are lost or removed, the buckets are reassigned to new members based on load. Losing a member in the cluster never results in re-assigning rows to buckets. You can specify the total number of buckets to use with the BUCKETS Clause clause of the CREATE TABLE statement. The default number of buckets is 113. In SQLFire, all peer servers in a distributed system know which peers host which buckets, so they can efficiently access a row with at most one network hop to the member that hosts the data. Reads or writes to a partitioned table are transparently routed to the server that hosts the row that is the target of the operation. Each peer maintains persistent communication channels to every peer in the cluster.
67
Managing Your Data in vFabric SQLFire
Figure 2: Partitioned Table Data Although each bucket is assigned to one or more specific servers, you can use a procedure to relocate buckets in a running system, in order to improve the utilization of resources across the cluster. See Rebalancing Partitioned Data on SQLFire Members on page 75. You can also pre-allocate buckets before loading data into the table, to ensure that imported data is evenly distributed among table partitions. See Pre-Allocating Buckets on page 75.
Understanding Where Data Is Stored SQLFire uses a table's partitioning column values and the partitioning strategy to calculate routing values (typically integer values). It uses the routing values to determine the "bucket" in which to store the data. Each bucket is then assigned to a server, or to multiple servers if the partitioned table is configured to have redundancy. The buckets are not assigned when the table is started up, but occurs lazily when the data is actually put into a bucket. This allows you to start a number of members before populating the table. Note: If you intend to immediately load a new, partitioned table with data, use the SYS.CREATE_ALL_BUCKETS procedure to create the buckets before loading data. This best practice ensures that partitions are balanced even if you rapidly load data into the table using concurrent threads.
68
vFabric SQLFire User's Guide
Partitioning Tables
Figure 3: Partitioned Data in Buckets If you set the redundant-copies for the table to be greater than zero, SQLFire designates one of the copies of each bucket as the primary copy. All writes to the bucket go through the primary copy. This ensures that all copies of the bucket are consistent. The Group Membership Service (GMS) and distributed locking service ensure that all distributed members have a consistent view of primaries and secondaries at any moment in time across the distributed system, regardless of membership changes initiated by the administrator or by failures.
Failure and Redundancy If you have redundant copies of a partitioned table, you can lose servers without loss of data or interruption of service. When a server fails, SQLFire automatically re-routes any operations that were trying to write to the failed member to the surviving members. SQLFire also attempts to re-route failed read operations to another server if possible. If a read operation returns only a single row, then transparent failover is always possible. However, if an operation returns multiple rows and the application has consumed one or more rows, then SQLFire cannot fail over if a server involved in the query happens goes offline before all the results have been consumed; in this case the application receives a SQLException with SQLState X0Z01. All applications should account for the possibility of receiving such an exception, and should manually retry the query if such a failure occurs.. Read operations are also retried if a server is unavailable when a query is performed. In this figure, M1 is reading table values W and Y. It reads W directly from its local copy and attempts to read Y from M3, which is currently offline. In this case, the read is automatically retried in another available member that holds a redundant copy of the table data.
69
Managing Your Data in vFabric SQLFire
Creating Partitioned Tables You create a partitioned table on a set of servers identified by named server groups (or on the default server group if you do not specify a named server group). Clauses in the CREATE TABLE statement determine how table data is partitioned, colocated, and replicated across the server group. This topic focuses on the partitioning_clause in CREATE TABLE. The CREATE TABLE reference page describes all of the options in more detail. The partitioning_clause controls the location and distribution of data in server groups. Using server groups and colocation is important for optimizing queries, and it is essential for cross-table joins. This version of SQLFire does not support cross-table joins for non-colocated data, so you must choose the partitioning clause carefully to enable the joins required by your application. The partitioning clause can specify column partitioning, range partitioning, list partitioning, or expression partitioning: { { PARTITION BY { PRIMARY KEY | COLUMN ( column-name [ , column-name ]* ) } |
70
vFabric SQLFire User's Guide
Partitioning Tables
PARTITION BY RANGE ( column-name ) ( VALUES BETWEEN value AND value [ , VALUES BETWEEN value AND value ]* ) | PARTITION BY LIST ( column-name ) ( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* ) | PARTITION BY ( expression ) } [ COLOCATE WITH ( table-name [ , table-name ] * ) ] } [ [ [ [
REDUNDANCY integer-constant ] BUCKETS integer-constant ] MAXPARTSIZE integer-constant ] RECOVERYDELAY integer-constant ]
Note: If the table has no primary key, then SQLFire generates a unique row ID that is uses for partitioning the data. SQLFire supports the partitioning strategies described below. Partitioning strategy
Description
Column partitioning
The PARTITION BY COLUMN clause defines a set of column names to use as the basis for partitioning. As a short-cut, you can use PARTITION BY PRIMARY KEY to refer to the table's primary key column(s). SQLFire uses an internal hash function that typically uses the hashCode() method of the underlying Java type for the specified column. For multiple columns, the internal hash function uses the serialized bytes of the specified columns to compute the hash.
Range partitioning
The PARTITION BY RANGE clause specifies the ranges of a field that should be colocated. This ensures the locality of data for range queries and for cross-table joins. The lower limit of the range is inclusive and the upper limit is exclusive. It is not necessary for the ranges to cover the whole spectrum of possible values for the field. Values that are not covered by the range are automatically partitioned in the server group, but with no guarantee of locality for those values.
List partitioning
The PARTITION BY LIST clause specifies the set of values of a field that should be colocated to optimize queries and to support cross-table joins. It is not necessary to list all of the possible values for the field. Any the values that are not part of the list are automatically partitioned in the server group, but with no guarantee of locality for those values.
Expression partitioning
The PARTITION BY ( expression ) clause that includes an expression is a type of hash partitioning that uses the expression to specify the value on which to hash. The expression must only reference field names from the table. This allows rows to be colocated based on a function of their values.
Partitioning Examples You can partition tables by, for example, column (such as customer name), expression, priority ranges, and status.
71
Managing Your Data in vFabric SQLFire
Partition Based on Columns This statement creates a table that is partitioned by the "CustomerName" column. All rows with the same CustomerName are guaranteed to be colocated in the same process space. Here, the SERVER GROUPS clause determines the peers and servers that host data for the partitioned table. A server group is a subset of all the peers and servers that host data in the distributed system. CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY COLUMN ( CustomerName ) SERVER GROUPS ( OrdersDBServers); Partition Based on Ranges When you use the PARTITION BY RANGE clause, specify a column with multiple ranges of values to use for partitioning. The following example specifies partitioning based on three ranges of values for the "Priority" column: CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY RANGE ( Priority ) ( VALUES BETWEEN 1 AND 11, VALUES BETWEEN 11 AND 31, VALUES BETWEEN 31 AND 50 ); Partition Based on a List When you use the PARTITION BY LIST clause, specify a column name and one or more lists of column values to use for partitioning. The following example partitions the table based on three different lists of values for the "Status" column: CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId)
72
vFabric SQLFire User's Guide
Partitioning Tables
) PARTITION BY LIST ( Status ) ( VALUES ( 'pending', 'returned' ), VALUES ( 'shipped', 'received' ), VALUES ( 'hold' ) ); Partition Based on an Expression Expression partitioning partitions a table by evaluating a SQL expression that you supply. For example, the following statement partitions the table based on the month of the OrderDate column, using the MONTH function as the SQL expression: CREATE TABLE Orders ( OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerName VARCHAR(100), OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) ) PARTITION BY ( MONTH( OrderDate ) ); Colocating Related Rows from Multiple Tables The COLOCATE WITH clause specifies the tables with which the partitioned table must be colocated. Note: Tables that are referenced in the COLOCATE WITH clause must exist at the time you create the partitioned table. When two tables are partitioned on columns and colocated, it forces partitions having the same values for those columns in both tables to be located on the same SQLFire member. For example, with range or list partitioning, any rows that satisfy the range or list are colocated on the same member for all the colocated tables. When you specify the COLOCATE WITH clause, you must use the PARTITION BY clause to specify partition columns in the target tables in the same order using the same partitioning strategy (for example, with identical ranges). The columns must also be of the same type, not considering constraints. Any REDUNDANCY or BUCKETS clause must also be the same as the tables with which it is colocated. Note: In order for two partitioned tables to be colocated, the SERVER GROUPS clauses in both CREATE TABLE statements must be identical. In order for two replicated tables to be colocated, both tables must specify the same server groups or one table's server groups must be a subset of the other table's server groups. For example, if you create the partitioned table, "countries," as follows: CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL CONSTRAINT COUNTRIES_UNQ_NM Unique, COUNTRY_ISO_CODE CHAR(2) NOT NULL CONSTRAINT COUNTRIES_PK PRIMARY KEY, REGION VARCHAR(26), CONSTRAINT COUNTRIES_UC CHECK (country_ISO_code = upper(country_ISO_code) ) ) PARTITION BY PRIMARY KEY
73
Managing Your Data in vFabric SQLFire
You can colocate another table, "cities," using the command: CREATE TABLE CITIES ( CITY_ID INTEGER NOT NULL CONSTRAINT CITIES_PK Primary key, CITY_NAME VARCHAR(24) NOT NULL, COUNTRY VARCHAR(26) NOT NULL, AIRPORT VARCHAR(3), LANGUAGE VARCHAR(16), COUNTRY_ISO_CODE CHAR(2) CONSTRAINT COUNTRIES_FK REFERENCES COUNTRIES (COUNTRY_ISO_CODE) ) PARTITION BY COLUMN (COUNTRY_ISO_CODE) COLOCATE WITH (COUNTRIES) In this example, both "countries" and "cities" are partitioned using the COUNTRY_ISO_CODE column. Rows with the same COUNTRY_ISO_CODE value are colocated on the same SQLFire members. See the CREATE TABLE reference page for more information. Making a Partitioned Table Highly Available Use the REDUNDANCY clause to specify a number of redundant copies of a table for each partition to maintain. Because SQLFire is primarily a memory-based data management system, it is important to use redundancy when necessary to enable fail-over if a member shuts down or fails. However, keep in mind that maintaining a large number of redundant copies has an adverse impact on performance, network usage, and memory usage. A REDUNDANCY value of 1 is recommended to maintain a secondary copy of the table data. For example: CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL, COUNTRY_ISO_CODE CHAR(2) NOT PRIMARY KEY, REGION VARCHAR(26), ) REDUNDANCY 1 SQLFire attempts to place copies of the same bucket onto hosts that have different IP addresses if possible, to protect against machine failure. However, if there is only one machine available SQLFire places multiple copies on that machine. Setting the enforce-unique-host boot property prevents SQLFire from ever placing multiple copies on the same machine. Set the redundancy-zone boot property to ensure that SQLFire places redundant copies on specific zones that you define. For example, to ensure that redundant copies are placed on different racks, set the redundancy-zone for each machine to the logical name of the rack on which the machine runs. See Configuration Properties on page 327. If the primary and secondary buckets for a partition are unavailable, then queries to the table can fail. For a persistent table, queries involving missing buckets fail with a PartitionOfflineException. Queries across the entire table would also fail. Queries to non-persistent tables with missing partitions can return empty results. Limiting the Memory Consumption on a Member Use the MAXPARTSIZE clause of the CREATE TABLE statement to load- balance partitioned data among the available members. The MAXPARTSIZE ClauseExample with BUCKETS, RECOVERYDELAY, and MAXPARTSIZE clause specifies the maximum memory for any partition on a SQLFire member.
74
vFabric SQLFire User's Guide
Partitioning Tables
Pre-Allocating Buckets As a best pratice, use the SYS.CREATE_ALL_BUCKETS procedure to create the buckets for a partitioned table before you begin loading data. SYS.CREATE_ALL_BUCKETS immediately creates the necessary buckets on datastores for partitioned tables. (Normally SQLFire assigns buckets to servers lazily, as inserts are executed against the table.) Pre-allocating the buckets ensures that partitioned table data is evenly distributed in the cluster. If you do not use SYS.CREATE_ALL_BUCKETS, data can become skewed if you rapidly load table data using concurrent processes. If a partitioned table becomes skewed, use SYS.REBALANCE_ALL_BUCKETS to rebalance all partitioned table data in the distributed system.
Rebalancing Partitioned Data on SQLFire Members You can use rebalancing to dynamically increase or decrease your SQLFire cluster capacity, or to improve the balance of data across the distributed system. Rebalancing is a SQLFire member operation that affects partitioned tables created in the cluster. Rebalancing performs two tasks: • If the a partitioned table's redundancy setting is not satisfied, rebalancing does what it can to recover redundancy. See Making a Partitioned Table Highly Available on page 74. • Rebalancing moves the partitioned table's data buckets between host members as needed to establish the best balance of data across the distributed system. For efficiency, when starting multiple members, trigger the rebalance a single time, after you have added all members. Note: If you have transactions running in your system, be careful in planning your rebalancing operations. Rebalancing may move data between members, which could cause a running transaction to fail with a TransactionDataRebalancedException. Start a rebalance operation using one of the following options: • At the command line when you boot a SQLFire server: sqlf server start -rebalance • Eexecuting a system procedure in a running SQLFire member: call sys.rebalance_all_buckets(); This procedure initiates rebalancing of buckets across the entire SQLFire cluster for all partitioned tables. How Partitioned Table Rebalancing Works The rebalancing operation runs asynchronously. As a general rule, rebalancing takes place on one partitioned table at a time. For tables that have colocated data, rebalancing works on the tables as a group, maintaining the data colocation between the tables. You can continue to access partitioned tables while rebalancing is in progress. Queries, DML operations, and procedure executions continue while data is moving. If a procedure executes on a local data set, you may see a performance degradation if that data moves to another member during procedure execution. Future invocations are routed to the correct member. For tables that are configured with expiration based on idle time, the rebalancing operation resets the last accessed time of the table entries on buckets that are moved.
75
Managing Your Data in vFabric SQLFire
When to Rebalance a Partitioned Table You typically want to trigger rebalancing when overall capacity is increased or reduced through member startup, shut down or failure. You may also need to rebalance when you use partitioned table redundancy for high availability, and you have configured your table to not automatically recover redundancy after a SQLFire member fails (the default RECOVERYDELAY setting). In this case, SQLFire only restores redundancy when you invoke a rebalance operation. See Making a Partitioned Table Highly Available on page 74.
Managing Replication Failures SQLFire uses multiple failure detection algorithms to detect replication problems quickly. SQLFire replication design focuses on consistency, and does not allow suspect members or network-partitioned members to operate in isolation. Configuring suspect-member Alerts When any member of the distributed system fails, it is important for other services to detect the loss quickly and transition application clients to other members. Any peer or server in the cluster can detect a problem with another member of the cluster, which initiates "SUSPECT" processing with the membership coordinator. The membership coordinator then determines whether the suspect member should remain in the distributed system or should be removed. Use the ack-wait-threshold property to configure how long a SQLFire peer or server waits to receive an acknowledgment from other members that are replicating a table's data. The default value is 15 seconds; you specify a value from 0 to 2147483647 seconds. After this period, the replicating peer sends a severe alert warning to other members in the distributed system, raising a "suspect_member" alert in the cluster. To configure how long the cluster waits for this alert to be acknowledged, set the ack-severe-alert-threshold property. The default value is zero, which disables the property. How Replication Failure Occurs Failures during replication can occur in the following ways: • A replica fails before sending an acknowledgment. The most common failure occurs when a member process is terminated during replication. When this occurs, the TCP connection from all members is terminated, and the membership view is updated quickly to reflect the change. The member who initiated replication continues replicating to other members. If instead of terminating, the process stays alive (but fails to respond) the initiating member waits for a period of time and then raises an alert with the distributed system membership coordinator. The membership coordinator evaluates the health of the suspect member based on heartbeats and health reports from other members in the distributed system. The coordinator may decide to evict the member from the distributed system, in which case it communicates this change in the membership view to all members. At this point, the member that initiated replication proceeds and completes replication using available peers and servers. In addition, clients connected to this member are automatically re-routed to other members. • An "Owning" member fails. If the designated owner of data for a certain key fails, the system automatically chooses another replica to become the owner for the key range that the failed member managed. The updating thread is blocked while this transfer takes place. If at least one replica is available, the operations always succeeds from the application's viewpoint.
76
vFabric SQLFire User's Guide
Chapter
12
Replicating Tables SQLFire server groups control which SQLFire data store members replicate the table's data. SQLFire replicates table data both when a new table is initialized in a cluster and when replicated tables are updated.
How SQLFire Replicates Tables SQLFire replicates every single row of a replicated table synchronously to each table replica in the target server group(s). With synchronous replication, table reads can be evenly balanced to any replica in the cluster, with no single member becoming a contention point. SQLFire replicates data to all peers in the server groups where the table was created. Replication occurs in parallel to other peers over a TCP channel. SQLFire replicates table data both when a new table is initialized in a cluster and when replicated tables are updated. Replication at Initialization When a non-persistent ("memory only") replicated table is created in a peer or server, it initializes itself using data from another member of the server group that hosts the table. A single data store member in the server group is chosen, and data for the table is streamed from that member to initialize the new replica. If the selected member fails during the replication process, the initializing process selects a different member in the server group to stream the data. Replication During Updates When an application updates a replicated table, SQLFire distributes the update to all replicas in parallel, utilizing the network bandwidth between individual members of the cluster. The sending peer or server locks the updated row locally, and then distributes the update to the replicas. After each of the replicas processes the update and responds with an acknowledgment message, the originating SQLFire peer returns control to the application. The update process is conservative in order to ensure that all copies of a replicated table contain consistent data. Each receiver processes the update entirely, applying constraint checks if necessary and updating the local table data, before responding with an acknowledgment. Each operation on a single row key is performed atomically per replica, regardless of how many columns are being read or updated.
Deciding When to Use Replicated Tables Code tables are often good candidates for replication. Application data is frequently normalized to maintain "code" fields in "fact" tables, and to maintain the details associated with each "code" in an associated "dimension" table. Code tables are often small and change infrequently, but they are frequently joined with their parent "fact" table in queries. Code tables of this sort are good candidates for using replicated tables.
77
Managing Your Data in vFabric SQLFire
Also note that this version of SQLFire supports joins only on co-located data. Instead of using partitioning in all cases, you should consider having applications replicate smaller tables that are joined with other partitioned tables. Note: If multiple applications update the same row of a replicated table at the same time outside of a transaction, the table data can become out of sync when SQLFire replicates those updates. Keep this limitation in mind when using replicated tables.
Creating Replicated Tables You can create replicated tables explicitly or by default, using CREATE TABLE statement. SQLFire creates a replicated table by default when you execute a CREATE TABLE on page 505 statement and you do not include a PARTITIONING clause. You can also explicitly create a replicated table using the REPLICATE clause in the CREATE TABLE statement. For example: CREATE TABLE COUNTRIES ( COUNTRY VARCHAR(26) NOT NULL CONSTRAINT COUNTRIES_UNQ_NM Unique, COUNTRY_ISO_CODE CHAR(2) NOT NULL CONSTRAINT COUNTRIES_PK PRIMARY KEY, REGION VARCHAR(26), CONSTRAINT COUNTRIES_UC CHECK (country_ISO_code = upper(country_ISO_code) ) ) REPLICATE; Because this command omits the SERVER GROUPS clause, the example creates the 'countries' table and replicates it on members of the cluster that host data (all peers and servers in the default server group that set the host-data property to true).
78
vFabric SQLFire User's Guide
Chapter
13
Estimating Memory Requirements Designing a SQLFire database also involves estimating the memory requirements for your data based on the size of the actual table values and indexes, the overhead that SQLFire requires for your data, and the overall usage pattern for your data. You can estimate the memory requirements for tables using general guidelines for SQLFire overhead. Or, you can load tables with representative data and then query the SQLFire SYS.MEMORYANALYTICS table to obtain details about the memory required for individual tables and indexes.
Estimating SQLFire Overhead SQLFire requires different amounts of overhead per table and index entry depending on whether you persist table data or configure tables for overflow to disk. Add these overhead figures to the estimated size of each table or index entry to provide a rough estimate for your data memory requirements. If you already have representative data, use the SQLFire Java agent to query the SYS.MEMORYANALYTICS table to obtain a more accurate picture of the memory required to store your data. Note: All overhead values are approximate. Be sure to validate your estimates in a test environment with representative data. Table 1: Approximate Overhead for SQLFire Table Entries Table is persisted?
Overflow is configured?
Approximate overhead
No
No
64 bytes
Yes
No
120 bytes
Yes
Yes
152 bytes
Table 2: Approximate Overhead for SQLFire Index Entries Type of index entry
Approximate overhead
New index entry
80 bytes
First non-unique index entry
24 bytes
Subsequent non-unique index entry
8 bytes to 24 bytes*
*If there are more than 100 entries for a single index entry, the overhead per entry increases from 8 bytes to approximately 24 bytes.
79
Managing Your Data in vFabric SQLFire
Viewing Memory Usage in SYS.MEMORYANALYTICS SQLFire includes instrumentation to display the memory used by individual tables and indexes in a SQLFire member. You can view this memory usage information by starting a Java agent process when you boot SQLFire members, and then querying the SYS.MEMORYANALYTICS virtual table from a client connection. Enabling SYS.MEMORYANALYTICS Follow this procedure to enable the SYS.MEMORYANALYTICS virtual table. Prerequisites • Create an evaluation SQLFire distributed system to determine the memory footprint for your data. Do not enable memory analytics on a production system. • You must have representative table data and index entries in order to accurately evaluate the memory footprint. Create the necessary SQL scripts to automate creating your schema and loading data into your tables. • Consider using fewer SQLFire members in the evaluation system to simplify the evaluation process. • If your SQLFire distributed system uses locators for member discovery, be sure to enable the Java agent on the locator as well as the data store members. Although a locator does not generally host data, you can use the locator data to compare data memory footprints with non-data memory footprints. Procedure Follow these steps to start a SQLFire member with the Java agent to provide memory analytics: 1. Use the -javaagent: jar_path Java system property to specify the sqlfire.jar file in your installation when you start each SQLFire member. For example, if you use sqlf to start a SQLFire server: sqlf server start -client-address=1527 -J-javaagent:c:\vFabric_SQLFire_11_bNNNNN\lib\sqlfire.jar Specify the complete path and filename of sqlfire.jar for your system. If you use a locator for member discovery, also use the -javaagent: jar_path system property when starting the locator. 2. If your schema and data are not already available in the SQLFire system (as persisted data), run any necessary SQL scripts to create the schema and load table data. For example: cd c:\vFabric_SQLFire_11_bNNNNN\quickstart sqlf sqlf> connect client 'localhost:1527'; sqlf> run 'create_colocated_schema.sql'; sqlf> run 'loadTables.sql'; 3. Connect to SQLFire and query the SYS.MEMORYANALYTICS table to view the memory usage information: sqlf sqlf> connect client 'localhost:1527'; sqlf> select * from sys.memoryanalytics; SQLENTITY |ID |MEMORY ---------------------------------------------------------------APP.FLIGHTS (Entry Size, Value Size, Row Count) |dyozie-e4310(6880):3439/59731 |30352,31436,542 AAP.FLIGHTS.DESTINDEX (Index Entry Overhead, SkipList Size, Max& |dyozie-e4310(6880):3439/59731 |2104,696,3 (2803 = 2.74 kb)
80
vFabric SQLFire User's Guide
Estimating Memory Requirements
APP.FLIGHTS.DESTINDEX (Index Entry Size, Value Size, Row Count) |dyozie-e4310(6880):3439/59731 |4888,3784,87 [...]
Understanding Table and Index Values Querying the SYS.MEMORYANALYTICS table provides run-time information about the tables and indexes available in SQLFire. Table Values Each table has a single row in SYS.MEMORYANALYTICS identitfied with the SQLENTITY format: schema_name.table_name (Entry Size, Value Size, Row Count). The ID column value displays the values separated by commas. Table Value
Description
Entry Size
The per-entry overhead, in bytes. (This excludes the Value Size below.)
Value Size
The total size of a table row, in bytes. (This includes the Entry Size overhead.)
Row Count
The total number of rows stored on the local SQLFire member. For a partitioned table, this includes all buckets for the table, as well as primary and secondary replicas.
For example, the following row from SYS.MEMORYANALYTICS shows that the APP.FLIGHTS table has 30352 bytes of overhead per row, with 542 total rows: SQLENTITY |ID |MEMORY ---------------------------------------------------------------APP.FLIGHTS (Entry Size, Value Size, Row Count) |dyozie-e4310(6880):3439/59731 |30352,31436,542 Index Values Each index has two rows in SYS.MEMORYANALYTICS. The first row uses the SQLENTITY format: schema_name.table_name.index_name (Index Entry Overhead, SkipList Size, Max Level). This row provides details about concurrent skip lists for the index. The ID column value displays these values separated by commas: Index Value
Description
Index Entry Overhead
The number of linked list objects that the index uses for storing the entry.
SkipList Size
The number of linked list objects that the index uses for skipping values to expedite searches.
Max Level
The total number of skip lists that are currently available for lookups.
81
Managing Your Data in vFabric SQLFire
For example, the following row from SYS.MEMORYANALYTICS shows that APP.FLIGHTS.DESTINDEX uses 2803 bytes for skip lists: SQLENTITY |ID |MEMORY ---------------------------------------------------------------APP.FLIGHTS.DESTINDEX (Index Entry Overhead, SkipList Size, Max& |dyozie-e4310(6880):3439/59731 |2104,696,3 (2803 = 2.74 kb) The second row for an index uses the SQLENTITY format: schema_name.table_name.index_name (Index Entry Size, Value Size, Row Count). This row provides information similar to the values for a table entry. The ID column value displays these values separated by commas: Index Value
Description
Index Entry Size
The per-index overhead, in bytes. (This includes the Index Entry Overhead from the previous table, but excludes the Value Size below.)
Value Size
The size of the data structure used to point to the underlying region entries, in bytes. For a unique index, this corresponds to the size of a pointer reference.
Row Count
The total number of entries in the index.
For example, the following row from SYS.MEMORYANALYTICS shows that APP.FLIGHTS.DESTINDEX has 4888 bytes of overhead with 87 index entries: SQLENTITY |ID |MEMORY ---------------------------------------------------------------APP.FLIGHTS.DESTINDEX (Index Entry Size, Value Size, Row Count) |dyozie-e4310(6880):3439/59731 |4888,3784,87 Displaying the Total Memory Footprint You can query SYS.MEMORYANALYTICS with an optmizer hint to display the complete memory footprint of tables and indexes. Use the sizerHints=withMemoryFootPrint hint with your query to display the memory footprint: SELECT * FROM sys.memoryAnalytics -- SQLFIRE-PROPERTIES sizerHints=withMemoryFootPrint ; Note: When processing a SQL script file, SQLFire does not recognize the terminating semicolon character if it appears at the end of a SQL comment line. For this reason, you must place the terminating semicolon on a separate line by itself, if the optmizer hint extends to the end of the statement. When you include this optimizer hint, the query displays an additional row for each table and index, summarizing the memory footprint for the SQLFire, GemFire, and other components. For example: SQLENTITY |ID |MEMORY ---------------------------------------------------------------[...] 82
vFabric SQLFire User's Guide
Estimating Memory Requirements
APP.FLIGHTS (sqlfire,gemfire,others) |dyozie-e4310(6880):3439/59731 |12912,299736,680704 (993352 = 970.07 kb) APP.FLIGHTS.DESTINDEX (sqlfire,gemfire,others) |dyozie-e4310(6880):3439/59731 |5072,0,3288 (8360 = 8.16 kb) [...]
83
Chapter
14
Using Disk Stores to Persist Data By default, a SQLFire distributed system persists only the data dictionary for the tables and indexes you create. These persistence files are stored in the datadictionary subdirectory of each locator and data store that joins the distributed system. Table data, however, is not persisted by default; if you shut down al SQLFire members, the tables are empty on the next startup. You have the option to persist table data to disk as a backup of the in-memory copy, or to overflow table data to disk when memory use gets too high.
Overview of Disk Stores The two disk store options, overflow and persistence, can be used individually or together. Overflow uses disk stores as an extension of in-memory table management for both partitioned and replicated tables. Persistence stores a redundant copy of all table data managed in a peer. See Evicting Table Data from SQLFire on page 197 for more information about configuring tables to overflow to disk. Data Types for Disk Storage Disk storage is available for these data types: • Table data. Persist and/or overflow table data managed in SQLFire peers. • Gateway sender queues. Persist gateway sender queues for high availability in a WAN deployment. These queues always overflow. • AsyncEventListener and DBSynchronizer queues. Persist these queues for high availability. These queues always overflow, and can be persistent. You can store data from multiple tables and queues in a single disk store. Creating Disk Stores and Using the Default Disk Store You create named disk stores in the data dictionary using the CREATE DISKSTORE DDL statement. Individual tables can then specify named disk stores in their CREATE TABLE DDL statements to use the stores for persistence and/or overflow. You can store data from multiple tables and queues in the same named disk store. See Guidelines for Designing Disk Stores on page 91. Tables that do not name a disk store but specify persistence or overflow in their CREATE TABLE statement use the default disk store. The location of the default diskstore is determined by the value of the sys-disk-dir boot property. The default disk store is named SQLF-DEFAULT-DISKSTORE. Gateway sender queues, AsyncEventListener queues, and DBSynchronizer queues can also be configured to use a named disk store. The default disk store is used if you do not specify a named disk store when creating the queue. See CREATE GATEWAYSENDER on page 498 or CREATE ASYNCEVENTLISTENER on page 490.
85
Managing Your Data in vFabric SQLFire
Peer Client Considerations for Persistent Data Peer clients (clients started using the host-data=false property) do not use disk stores and can never persist the SQLFire data dictionary. Instead, peer clients rely on other data stores or locators in the distributed system for persisting data. If you use a peer client to execute DDL statements that require persistence and there are no data stores available in the distributed system, SQLFire throws a data store unavailable exception (SQLState: X0Z08). You must start locators and data stores before starting peer clients in your distributed system. If you start a peer client as the first member of a distributed system, the client initializes an empty data dictionary for the distributed system as a whole. Any subsequent datastore that attempts to join the system conflicts with the empty data dictionary and fails to start with a ConflictingPersistentDataException. Shared-Nothing Disk Store Design Individual SQLFire peers that host table data manage their own disk store files, completely separate from the disk stores files of any other member. When you create a disk store, you can define certain properties that specify where and how each SQLFire peer should manages disk store files on their local filesystem. SQLFire supports persistence for replicated and partitioned tables. The disk store mechanism is designed for locally-attached disks. Each peer manages its own local disk store, and does not share any disk artifacts with other members of the cluster. This shared nothing design eliminates the process-level contention that is normally associated with traditional clustered databases. Disk stores use rolling, append-only log files to avoid disk seeks completely. No complex B-Tree data structures are stored on disk; instead SQLFire always assumes that complex query navigation is performed using in-memory indexes. Disk stores also support the SQLFire data rebalancing model. When you increase or decrease capacity by adding or removing peers in a cluster, the disk data also relocates itself as necessary. What SQLFire Writes to the Disk Store For each disk store, SQLFire stores detailed information about related members and tables. SQLFire stores these items in each disk store: • List of members that host the store and information on their status, such as running or offline and time stamps. • List of tables that use the disk store. For each table in the disk store, SQLFire stores: • Configuration attributes pertaining to loading and capacity management, used to load the data quickly on startup. • Table DML operations. Disk Store State Disk store access and management differs according to whether the store is online or offline. When a member shuts down, its disk stores go offline. When the member starts up again, its disk stores come back online in the SQLFire cluster. • Online, a disk store is owned and managed by its member process. • Offline, the disk store is simply a collection of files in your host file system. The files are open to access by anyone with the right file system permissions. You can copy the files using your file system commands, for backup or to move your member’s disk store location. You can also run maintenance operations on the offline disk store, like file compaction and validation using the sqlf utility. Note: The files for a disk store are used by SQLFire as a group. Treat them as a single entity. If you copy them, copy them all together. Do not change the file names or permissions.
86
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
When a disk store is offline, its data is unavailable to the SQLFire distributed system. For partitioned tables, the data is split between multiple members, so you can access the offline data only if you store replicas of the partitioned table on other members of the cluster. Disk Store Directories When you create a disk store, optionally specify the location of directories where sqlf stores persistence-related files. sqlf generates disk store artifacts in a directory that it chooses in the following order: 1. If you provide absolute directory paths, sqlf uses the paths as-is. You must ensure that the proper directory structure exists. 2. If you provide relative directory paths, or you do not specify any directories, then the path resolution is done in this order: 1. If the sys-disk-dir boot property specifies a directory, the disk store path is resolved relative to that directory or is used as the directory for persistence. 2. If sys-disk-dir is not set, SQLFire uses the directory from which the system was started (the current directory) for path resolution, or it uses that directory as the persistence directory.
Disk Store Persistence Attributes SQLFire persists data on disk stores in synchronous or asynchronous mode. You configure the persistence mode for a table in the CREATE TABLE statement, while attributes to control asynchronous persistence are configured in the CREATE DISKSTOREstatement. In synchronous mode, SQLFire writes each DML operation to the OS buffer as part of the statement execution. This mode provides greater reliability than asynchronous mode, but with lower performance. In asynchronous mode, SQLFire batches DML statements before flushing them to the OS buffers. This is faster than synchronous mode, but batch operations may be lost if a failure occurs. (You can use redundancy to ensure that updates are successfully logged on another machine.) In asynchronous mode, you can control the frequency of flushing batches by setting the following attributes when you create a named disk store: • QUEUESIZE sets the number of affected rows that can be asynchronously queued. After this number of pending rows are queued, operations begin blocking until some of the modified, created, or deleted rows are flushed to disk. • TIMEINTERVAL sets the number of milliseconds that can elapse before queued data is flushed to disk. See CREATE DISKSTORE on page 494 Note: Always start all SQLFire servers and peer processes that host data on disk in parallel. A SQLFire process with persistent data may wait for other processes to startup first to guarantee consistency. Always use the shutdown-all command to gracefully shut down a cluster. This allows each member to reach a consistent replication state and record that state with other replicas in the cluster. When you restart peers after a graceful shutdown, each member can recover in parallel without waiting for others to provide consistent replication data. Disk Store File Names and Extensions Disk store files include store management and access control files and the operation log, or oplog, files, consisting of one file for deletions and another for all other operations. The next table describe file names and extensions; they are followed by example disk store files.
87
Managing Your Data in vFabric SQLFire
File Names File names have three parts. First Part of File Name: Usage Identifier Usage Identifier Values
Usage
Examples
OVERFLOW
Oplog data from overflow tables and queues OVERFLOWoverflowDS1_1.crf only.
BACKUP
Oplog data from persistent and persistent+overflow tables and queues.
BACKUPoverflowDS1.if, BACKUPSQLF-DEFAULT-DISKSTORE.if
DRLK_IF
Access control - locking the disk store.
DRLK_IFoverflowDS1.lk, DRLK_IFSQLF-DEFAULT-DISKSTORE.lk
Second Part of File Name: Disk Store Name Disk Store Name Values
Usage
Examples
Non-default disk stores.
name="OVERFLOWDS1" DRLK_IFOVERFLOWDS1.lk, name="PERSISTDS1" BACKUPPERSISTDS1_1.crf Note: SQLFire internally converts and uses disk store names in upper case characters, even if you specify lower case characters in the DDL statement.
SQLF-DEFAULT-DISKSTORE
Default disk store name, used when persistence or overflow are specified on a table or queue but no disk store is named.
DRLK_IFSQLF-DEFAULT-DISKSTORE.lk, BACKUPSQLF-DEFAULT-DISKSTORE_1.crf
SQLF-DD-DISKSTORE
Default disk store for persisting the data dictionary.
BACKUPSQLF-DD-DISKSTORE_1.crf
Third Part of File Name: oplog Sequence Number oplog Sequence Number
Usage
Examples
Sequence number in the format _n
Oplog data files only. Numbering starts with OVERFLOWoverflowDS1_1.crf, 1. BACKUPpersistDS1_2.crf, BACKUPpersistDS1_3.crf
File Extensions
88
File Extension Values
Usage
Notes
if
Disk store metadata
Stored in the first disk-dir listed for the store. Negligible size - not considered in size control.
lk
Disk store access control
Stored in the first disk-dir listed for the store. Negligible size - not considered in size control.
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
File Extension Values
Usage
Notes
crf
Oplog: create, update, and invalidate operations
Pre-allocated 90% of the total max-oplog-size at creation.
drf
Oplog: delete operations
Pre-allocated 10% of the total max-oplog-size at creation.
krf
Oplog: key and crf offset information
Created after the oplog has reached the max-oplog-size. Used to improve performance at startup.
Disk Store Operation Logs At creation, each operation log is initialized at the disk store's MAXLOGSIZE value, with the size divided between the crf and drf files. SQLFire only truncates the unused space on a clean shutdown (for example, sqlf server stop or sqlf shut-down-all). After the oplog is closed, SQLFire also attempts to created a krf file, which contains the key names as well as the offset for the value within the crf file. Although this file is not required for startup, if it is available, it will improve startup performance by allowing SQLFire to load the entry values in the background after the entry keys are loaded. See When an operation log is full, SQLFire automatically closes it and creates a new log with the next sequence number. This is called oplog rolling. Note: Log compaction can change the names of the disk store files. File number sequencing is usually altered, with some existing logs removed or replaced by newer logs with higher numbering. SQLFire always starts a new log at a number higher than any existing number. The system rotates through all available disk directories to write its logs. The next log is always started in a directory that has not reached its configured capacity, if one exists. When Disk Store oplogs Reach the Configured Disk Capacity If no directory exists that is within its capacity limits, how SQLFire handles this depends on whether automatic compaction is enabled. • If AUTOCOMPACT is enabled (set to 'true), SQLFire creates a new oplog in one of the directories, going over the limit, and logs a warning that reports: Even though the configured directory size limit has been exceeded a new oplog will be created. The current limit is of XXX. The current space used in the directory is YYY. Note: When auto-compaction is enabled, directory sizes do not limit how much disk space is used. SQLFire performs auto-compaction, which should free space, but the system may go over the configured disk limits. • If auto-compaction is disabled, SQLFire does not create a new oplog. DML operations to tables block, and SQLFire logs the error: Disk is full and rolling is disabled. No space can be created
Factors Contributing to High Disk Throughput SQLFire disk store design contains several factors that contribute to very high disk throughput. They include pooling, avoiding seeks, and buffered logging.
89
Managing Your Data in vFabric SQLFire
Pooling Each SQLFire member manages its own disk store, and no disk contention exists between processes. Each partition can manage its data on local disks. If the application "write" load can be uniformly balanced across the cluster, the aggregate disk throughput is equal to the disk transfer rate time the number of partitions, assuming a single disk per partition. Disk transfer rates can be up to 100MB/sec on commodity machines today. Avoiding Seeks Random access on disk, which causes disk heads to seek for hundreds of concurrent thread, is probably the single biggest reason why traditional databases do not scale. Average disk seek times today are still 2ms or higher. SQLFire manages most (or all) of the data in cluster memory, and all reads are served without navigating through BTree-based indexes and data files on disk. This is the case when data is persistent. Note, however, that in overflow-only mode, data files on disk are accessed as necessary.
Buffered Logging When writes do occur, SQLFire simply logs the operations to disk in "append-only" oplog files. By appending, SQLFire can continuously write to consecutive sectors on disk without requiring disk head movement. SQLFire flushes writes to the OS buffer rather than 'fsync' all the way to disk. The writes are buffered by the IO subsystem in the kernel, which allows the IO scheduler to merge and sort disk writes to achieve the highest possible disk throughput. Write requests need not initiate any disk I/O until some time in the future. Thus, from the perspective of a user application, write requests stream at much higher speeds, unencumbered by the performance of the disk. Risk of data loss due to sudden failures at a hardware level are mitigated by having multiple members writing in parallel to disk. In fact, it is assumed that hardware will fail, especially in large clusters and data centers, and that software needs to take these failures into account. The SQLFire system is designed to recover in parallel from disk and to guarantee data consistency when data copies on disk do not
90
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
agree with each other. Each member of the distributed system logs membership changes to its persistent files and uses this information during recovery to determine the replica that has the latest changes, and members automatically synchronize these changes at startup.
Guidelines for Designing Disk Stores Work with your system designers and developers to plan for disk storage requirements in testing and production database systems. Work with host system administrators to determine where to place your disk store directories on each peer computer. Consider these guidelines when designing disk stores: • Tables can be overflowed, persisted, or both. For efficiency, place table data that is overflowed on one disk store with a dedicated physical disk. Place table data that is persisted, or persisted and overflowed, on another disk store with on a different physical disk. For example, gateway sender, AsyncEventListener, and DBSynchronizer queues are always overflowed and may be persisted. Assign them to overflow disk stores if you do not persist, and to persistence disk stores if you do. Ensure that each disk store resides on a separate physical disk, for best performance. • When calculating your disk requirements, consider your table modification patterns and compaction strategy. SQLFire creates each oplog file at the specified MAXLOGSIZE. Obsolete DML operations are only removed from the oplogs during compaction, so you need enough space to store all operations that are done between compactions. For tables where you are doing a mix of updates and deletes, if you use automatic compaction, a good upper bound for the required disk space is (1 / (1 - (compaction_threshold/100)) ) * data size where data size is the total size of all the table data you store in the disk store. So, for the default COMPACTIONTHRESHOLD of 50, the disk space is roughly twice your data size. The compaction thread could lag behind other operations, causing disk use to rise above the threshold temporarily. If you disable automatic compaction, the amount of disk required depends on how many obsolete operations accumulate between manual compactions. • Based on your anticipated disk storage requirements and the available disks on your host systems: • Make sure the new storage does not interfere with other processes that use disk on your systems. If possible, store your files to disks that are not used by other processes, including virtual memory or swap space. If you have multiple disks available, for the best performance, place one directory on each disk. • Use different directories for different peers that run on the same computer. You can use any number of directories for a single disk store. • Choose disk store names that reflect how the stores should be used and that work for your operating systems. Disk store names are used in the disk file names: • Use disk store names that satisfy the file naming requirements for your operating system. For example, if you store your data to disk in a Windows system, your disk store names could not contain any of these reserved characters, < > : " / \ | ? *. • Do not use very long disk store names. The full file names must fit within your operating system limits. On Linux, for example, the standard limitation is 255 characters. • Create each disk store with CREATE DISKSTORE before you create persistent or overflow tables. • You may choose to parallelize disk access for oplog and overflow files using by targeting disk store files to multiple logical disk partitions. • SQLFire disk store files must be highly available. Back up disk store files on a regular schedule, either by copying the files while the system is offline, or by using the sqlf backup command to perform online backups. SQLFire peers in the cluster manage their local disk stores using the properties you specify in the CREATE DISKSTORE statement. After you create named disk stores, you can create tables that persist or overflow their data to disk stores. 91
Managing Your Data in vFabric SQLFire
Related Topics CREATE DISKSTORE on page 494 CREATE TABLE on page 505 Persist Table Data to a Disk Store on page 92 Evicting table data from memoryUse eviction to remove table data from memory, or to persist the overflow data in a disk store.
Creating a Disk Store or Using the Default You can create a disk store for persistence and/or overflow or use the default disk store. Data from multiple tables can be stored in the same disk store. Default Disk Stores Tables that do not name a disk store but specify persistence or overflow in their CREATE TABLE statement are automatically assigned to the default disk store, SQLF-DEFAULT-DISKSTORE. Also, gateway, AsyncEventListener, and DBSynchronizer queues always use the default disk store. The default diskstore is saved to the SQLFire data store's working directory, unless you change the value of the sys-disk-dir boot property to specify another location. Note: SQLFire locator and data store members also create disk store files in order to persist the data dictionary for the tables and indexes created the SQLFire distributed system. These persistence files are stored in the datadictionary subdirectory of each locator and data store that joins the distributed system. The data dictionary is always persisted, regardless of whether you configure data persistence or overflow for individual tables. Table data is not persisted by default; if you shut down al SQLFire members, the tables are empty on the next startup. Never move or modify the datadictionary subdirectory or the associated disk store files in datadictionary. If the data dictionary of a SQLFire locator or data store member is unavailable, it can prevent other members from starting if the "offline" member potentially holds a more recent copy of the data dictionary. In this case, members will display a ConflictingPersistentDataException when attempting to start. Create a Disk Store You create a named disk store in the data dictionary using the CREATE DISKSTORE DDL statement. You then assign the disk store to an individual table by specifying the disk store in the table's CREATE TABLE DDL statement. You can store data from multiple tables and queues in the same named disk store.
Persist Table Data to a Disk Store You configure the persistence settings for a partitioned or replicated table when you create the table with the CREATE TABLE DDL statement. SQLFire automatically recovers data from disk for persistent tables when you restart SQLFire members. Procedure 1. Ensure that the data dictionary is persisted in your SQLFire cluster. SQLFire persists the data dictionary by default for all data stores, but you can explicitly enable or disable data dictionary persistence using the persist-dd boot property. Note: All SQLFire data stores in the same cluster must use a consistent persist-dd value. Accessors cannot persist data, and you cannot set persist-dd to true for an accessor.
92
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
2. Create the disk store that you want to use for persisting the table's data, or use the default disk store. See Creating a Disk Store or Using the Default on page 92. 3. Specify table persistence and the named disk store in the CREATE TABLE statement. For example: CREATE TABLE Orders(OrderId INT NOT NULL,ItemId INT ) persistent 'OrdersDiskStore' asynchronous This example uses asynchronous writes to persist table data to the "OrdersDiskStore." Note: Persistent tables must be associated with a disk store. If you do not specify a named disk store in the CREATE TABLE statement, SQLFire persists the table to the default disk store. For example, the following statement persists the new table to the default disk store: CREATE TABLE Orders(OrderId INT NOT NULL,ItemId INT ) persistent asynchronous Note: When you specify asynchronous persistence, asynchronous writes to the disk store use certain disk store attributes. See Disk Store Persistence Attributes on page 87.
Related Topics CREATE DISKSTORE on page 494 CREATE TABLE on page 505 Evicting table data from memoryUse eviction to remove table data from memory, or to persist the overflow data in a disk store.
Optimizing Availability and Performance Be aware of what you can do to optimize availability and performance of disk stores. Consider these guidelines: • When you start your system, start all the members that have persistent tables in parallel. Create and use startup scripts for consistency and completeness. • Shut down your system using the sqlf shut-down-all command. This is an ordered shutdown that shuts down all data stores and accessors, but leaves locators and JMX agents running. When shutting down an entire system, a locator should be the last member to shut down (after all data stores have successfully stopped). • Decide on a file compaction policy and, if needed, develop procedures to monitor your files and execute regular compaction. • Decide on a backup strategy for your disk stores and follow it. You can back up by copying the files while the system is offline, or you can back up an online system using the sqlf backup command. • If you drop or alter any persistent table while your disk store is offline, consider synchronizing the tables in your disk stores.
Starting System with Disk Stores When you shut down a member that persists data, the data remains in the disk store files, available to be reloaded when the member starts up again. Keep in mind that peer clients are dependent on locators or data store members to persist data, as they cannot persist data on their own. The following sections explain what happens during startup and shutdown: • Shutdown: Most Recent Data from the Last Run on page 93 • Startup Process on page 94 • Example Startup Scenarios on page 95
93
Managing Your Data in vFabric SQLFire
Shutdown: Most Recent Data from the Last Run If more than one member has the same persistent table or queue, the last member to exit leaves the most up-to-date data on disk. SQLFire stores information on member exit order in the disk stores, so it can start your members with the most recent data set: • For a persistent replicated table, the last member to exit leaves the most recent data on disk. • For a partitioned table, where the data is split into buckets, the last member to exist that hosts a particular bucket leaves the most recent data on disk for that bucket. Note: Peer clients rely on data stores for persistence. See Peer Client Considerations for Persistent Data on page 92. Startup Process When you start a member with disk stores, the stores are loaded back into memory to initialize the member’s table data. Note: Peer clients rely on data stores for persistence. See Peer Client Considerations for Persistent Data on page 92. If the member does not hold all of the most recent data in the system: 1. The member does not immediately join the server group, but waits for the member with the most recent data. If your log level is info or below, the system provides messaging about the wait. Here, the disk store for hostA has the most recent data and the hostB member is waiting for it. [info 2010/04/09 10:48:26.039 PDT CacheRunner tid=0x1] Region /persistent_PR initialized with data from /10.80.10.64:/export/straw3/users/jpearson/GemFireTesting/hostB/ backupDirectory created at timestamp 1270834766425 version 0 is waiting for the data previously hosted at [/10.80.10.64:/export/straw3/users/jpearson/GemFireTesting/hostA/ backupDirectory created at timestamp 1270834763353 version 0] to be available During normal startup you can expect to see some waiting messages. 2. When the most recent data is available, the system updates the local tables as needed, logs a message like this, and continues with startup. [info 2010/04/09 10:52:13.010 PDT CacheRunner tid=0x1] Done waiting for the remote data to be available. Each member’s persistent tables load and go online as quickly as possible, not waiting unnecessarily for other members to complete. For performance reasons, several actions are taken asynchronously: • If both primary and secondary buckets are persisted, data is made available when the primary buckets are loaded without waiting for the secondary buckets to load. The secondary buckets will load asynchronously. • Entry keys first get loaded from the key file if this file is available (see information about the krf file in Disk Store File Names and Extensions on page 87). Once all keys are loaded, SQLFire loads the entry values asynchronously. If a value is requested before it is loaded, the value will immediately be fetched from the disk store.
94
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
Example Startup Scenarios • Stop order for a replicated, persistent table: 1. Member A (MA) exits first, leaving persisted data on disk for TableP. 2. Member B (MB) continues to run DML operations on TableP, which update its disk store and leaves the disk store for MA in a stale condition. 3. MB exits, leaving the most up-to-date data on disk for Table P. • Restart order Scenario 1: 1. MB is started first. SQLFire recognizes MB as having the most recent disk data for TableP and initializes it from disk. 2. MA is started, recovers its data from disk, and updates it as needed from the data in MB. • Restart order Scenario 2: 1. MA is started first. SQLFire recognizes that MA does not have the most recent disk store data and waits for MB to start before creating TableP in MA. 2. MB is started. SQLFire recognizes MB as having the most recent disk data for TableP and initializes it from disk. 3. MA recovers its TableP data from disk and updates it as needed from the data in MB. Start a System with Disk Stores When you start a SQLFire cluster with disk stores, it is recommended that you start all members with persisted data at the same time. Procedure 1. Start members with persisted data at the same time. When members with persistent data boot, they determine which have the most recent table data, and they initialize their caches using that data. If you do not start persistent data stores in parallel, some members may hang while waiting for other members to start. The following example bash script starts members in parallel. The script waits for the startup to finish and exits with an error status if one of the jobs fails. #!/bin/bash # Start all local SQLFire members to waiting state, regardless of which member holds the most recent # disk store files: ssh servera "sqlf locator start -dir=/locator1 -sync=false" ssh serverb "sqlf server start -client-port=1528 -locators=localhost[10334] -dir=/server1 -sync=false" ssh serverc "sqlf server start -client-port=1529 -locators=localhost[10334] -dir=/server2 -sync=false" # Wait until all members have finished synchronizing and starting: ssh servera "sqlf locator wait -dir=/locator1" ssh serverb "sqlf server wait -dir=/server1" ssh serverc "sqlf server wait -dir=/server2" # Continue any additional tasks that require access to the SQLFire members... [...] 95
Managing Your Data in vFabric SQLFire
2. Respond to any member startup hangs. If a disk store with the most recent table data does not come online, other members wait indefinitely rather than come online with stale data. Check for missing disk stores with the sqlf list-missing-disk-stores command. See Handling Missing Disk Stores on page 102. a. If no disk stores are missing, your peer initialization may be slow for some other reason. Check to see if other members are hanging or fail to start. b. If disk stores are missing that you think should be there: a. Make sure you have started the member. Check the logs for any failure messages. b. Make sure your disk store files are accessible. If you have moved your member or disk store files, you must update your disk store configuration to match. c. If disk stores are missing that you know are lost, because you have deleted them or their files are otherwise unavailable, revoke them so the startup can continue. See Handling Missing Disk Stores on page 102.
Disk Store Management The sqlf command-line tool has several options for examining and managing your disk stores. The sqlf tool, along with the CREATE DISKSTORE statement, are your management tools for online and offline disk stores. Note: Each of these commands operates either on the online disk stores or offline disk stores, but not both. sqlf Command
Online or Offline Command
See . . .
validate-disk-store
Off
Validate a Disk Store on page 97
compact-all-disk-stores
On
Compacting Disk Store Log Files on page 97
compact-disk-store
Off
Compacting Disk Store Log Files on page 97
backup
On
Backing Up and Restoring Online Disk Stores on page 100
list-missing-disk-stores
On
Handling Missing Disk Stores on page 102
revoke-missing-disk-store
On
Handling Missing Disk Stores on page 102
For the complete command syntax of any sqlf command, run sqlf --help at the command line. Online sqlf Disk Store Operations For online operations, sqlf connects to a distributed system and sends the operation requests to the members that have disk stores. These commands do not run on offline disk stores. You must provide the command with a distributed system properties in a sqlfire.properties file, or specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number). Offline sqlf Disk Store Operations For offline operations, sqlf runs the command against the specified disk store and its specified directories. You must specify all directories for the disk store. Offline operations will not run on online disk stores. The tool locks the disk store while it is running, so the member cannot start in the middle of an operation. 96
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
If you try to run an offline command for an online disk store, you get a message like this: ERROR: Operation "validate-disk-store" failed because: disk-store=ds1: com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: ds1: Could not lock "hostA/ds1dir1/DRLK_IFds1.lk". Other JVMs might have created diskstore with same name using the same directory., caused by java.io.IOException: The file "hostA/ds1dir1/DRLK_IFds1.lk" is being used by another process. Validate a Disk Store The sqlf validate-disk-store command verifies the health of your offline disk store. It gives you information about the tables in it, the total rows, and the number of records that would be removed if you compacted the store. When to use this command: • Before compacting an offline disk store to help decide whether it’s worth doing. • Before restoring a disk store. • Any time you want to be sure the disk store is in good shape. Example sqlf validate-disk-store ds1 hostB/bupDirectory /partitioned_table entryCount=6 bucketCount=10 Disk store contains 1 compactable records. Total number of table entries in this disk store is: 6 Compacting Disk Store Log Files You can configure automatic compaction for an operation log based on percentage of garbage content. You can also request compaction manually for online and offline disk stores. The following topics deal with compaction: • How Compaction Works on page 97 • Online Compaction Diagram on page 98 • Run Online Compaction on page 98 • Run Offline Compaction on page 99 • Performance Benefits of Manual Compaction on page 99 • Directory Size Limits on page 99 • Example Compaction Run on page 99 How Compaction Works When a DML operation is added to a disk store, any preexisting operation record for the same record becomes obsolete, and SQLFire marks it as garbage. For example, when you update a record, the update operation is added to the store. If you delete the record later, the delete operation is added and the update operation becomes garbage. SQLFire does not remove garbage records as it goes, but it tracks the percentage of garbage in each operation log, and provides mechanisms for removing garbage to compact your log files. SQLFire compacts an old operation log by copying all non-garbage records into the current log and discarding the old files. As with logging, oplogs are rolled as needed during compaction to stay within the MAXLOGSIZE setting. You can configure the system to automatically compact any closed operation log when its garbage content reaches a certain percentage. You can also manually request compaction for online and offline disk stores. For
97
Managing Your Data in vFabric SQLFire
the online disk store, the current operation log is not available for compaction, no matter how much garbage it contains. Online Compaction Diagram
Offline compaction runs essentially in the same way, but without the incoming DML operations. Also, because there is no current open log, the compaction creates a new one to get started. Run Online Compaction Old log files become eligible for online compaction when their garbage content surpasses a configured percentage of the total file. A record is garbage when its operation is superseded by a more recent operation for the same record. During compaction, the non-garbage records are added to the current log along with new DML operations. Online compaction does not block current system operations. • Run automatic compaction. When AUTOCOMPACT is true, SQLFire automatically compacts each oplog when its garbage content surpasses the COMPACTIONTHRESHOLD. Automatic compaction takes cycles from your other operations, so you may want to disable it and only do manual compaction, to control the timing. • Run manual compaction. To run manual compaction: • Set the disk store attribute ALLOWFORCECOMPACTION to true. This causes SQLFire to maintain extra data about the files so that it can compact on demand. This is disabled by default to save space. You can run manual online compaction at any time while the system is running. Oplogs eligible for compaction based on the COMPACTIONTHRESHOLD are compacted into the current oplog. • Run manual compaction as needed. You can compact all online disk stores in a distributed system from the command-line. For example: sqlf compact-all-disk-stores Note: This sqlf command requires a local sqlfire.properties file that contains properties to locate the distributed system. Or, specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number).
98
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
Run Offline Compaction Offline compaction is a manual process. All log files are compacted as much as possible, regardless of how much garbage they hold. Offline compaction creates new log files for the compacted log records. Use this syntax to compact individual offline disk stores: sqlf compact-disk-store myDiskStoreName /firstDir /secondDir -maxOplogSize=maxMegabytesForOplog You must provide all of the directories in the disk store. If no oplog max size is specified, SQLFire uses the system default. Offline compaction can take a lot of memory. If you get a java.lang.OutOfMemory error while running this, you made need to increase your heap size. See the sqlf command help for instructions on how to do this. Performance Benefits of Manual Compaction You can improve performance during busy times if you disable automatic compaction and run your own manual compaction during lighter system load or during downtimes. You could run the API call after your application performs a large set of data operations. You could run sqlf compact-all-disk-stores every night when system use is very low. To follow a strategy like this, you need to set aside enough disk space to accommodate all non-compacted disk data. You might need to increase system monitoring to make sure you do not overrun your disk space. You may be able to run only offline compaction. If so, you can set ALLOWFORCECOMPACTION to false and avoid storing the information required for manual online compaction. Directory Size Limits If you reach the disk directory size limits during compaction: • For automatic compaction, the system logs a warning, but does not stop. • For manual compaction, the operation stops and returns a DiskAccessException to the calling process, reporting that the system has run out of disk space. Example Compaction Run In this example offline compaction run listing, the disk store compaction had nothing to do in the *_3.* files, so they were left alone. The *_4.* files had garbage records, so the oplog from them was compacted into the new *_5.* files. bash-2.05$ ls -ltra backupDirectory total 28 -rw-rw-r-1 jpearson users 3 Apr -rw-rw-r-1 jpearson users 25 Apr drwxrwxr-x 3 jpearson users 1024 Apr -rw-rw-r-1 jpearson users 7085 Apr -rw-rw-r-1 jpearson users 18 Apr -rw-rw-r-1 jpearson users 1070 Apr drwxrwxr-x 2 jpearson users 512 Apr
7 7 7 7 7 7 7
14:56 14:56 15:02 15:06 15:07 15:07 15:07
BACKUPds1_3.drf BACKUPds1_3.crf .. BACKUPds1.if BACKUPds1_4.drf BACKUPds1_4.crf .
bash-2.05$ sqlf validate-disk-store ds1 backupDirectory /root: entryCount=6 /partitioned_region entryCount=1 bucketCount=10 Disk store contains 12 compactable records. Total number of region entries in this disk store is: 7 bash-2.05$ sqlf compact-disk-store ds1 backupDirectory Offline compaction removed 12 records. Total number of region entries in this disk store is: 7 99
Managing Your Data in vFabric SQLFire
bash-2.05$ ls -ltra backupDirectory total 16 -rw-rw-r-1 jpearson users 3 Apr -rw-rw-r-1 jpearson users 25 Apr drwxrwxr-x 3 jpearson users 1024 Apr -rw-rw-r-1 jpearson users 0 Apr -rw-rw-r-1 jpearson users 638 Apr -rw-rw-r-1 jpearson users 2788 Apr drwxrwxr-x 2 jpearson users 512 Apr bash-2.05$
7 7 7 7 7 7 7
14:56 14:56 15:02 15:08 15:08 15:08 15:09
BACKUPds1_3.drf BACKUPds1_3.crf .. BACKUPds1_5.drf BACKUPds1_5.crf BACKUPds1.if .
Backing Up and Restoring Online Disk Stores When you invoke the sqlf backup command, SQLFire backs up disk stores for all members that are running in the distributed system at that time. Each member with persistent data creates a backup of its own configuration and disk stores. • Backup Guidelines and Prerequisites on page 100 • Back Up an Online Disk Store on page 100 • What the Online Backup Saves on page 101 • Restore an Online Backup on page 101 Backup Guidelines and Prerequisites • Run the backup during a period of low activity in your system. The backup does not block any activities in the distributed system, but it does use file system resources on all hosts in your distributed system and can affect performance. • Optionally, compact your disk store before running the backup. See Compacting Disk Store Log Files on page 97. • Only use the sqlf backup command to create backup files from a running distributed system. Do not try to create backup files from a running system using file copy commands. You will get incomplete and unusable copies. • Back up to a directory that all members can access. Make sure the directory exists and has the proper permissions for your members to write to it and create subdirectories. • Make sure there is a sqlfire.properties file for the distributed system in the directory where you will run the sqlf command, or specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number). The command will back up all disk stores in the specified distributed system. • The directory you specify for backup can be used multiple times. Each backup first creates a top level directory for the backup, under the directory you specify, identified to the minute. You can specify a directory by one of two methods; the command in the procedure below uses the first method. • Use a single physical location, such as a network file server. Example: /export/fileServerDirectory/sqlfBackupLocation • Use a directory that is local to all host machines in the system. Example: ./sqlfBackupLocation
Back Up an Online Disk Store 1. Run the backup command, providing your backup directory location. Example: sqlf backup /export/fileServerDirectory/gemfireBackupLocation -locators=warsaw.vmware.com[26340]
100
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
2. Read the message that reports on the success of the operation. If the operation is successful, you see a message like this: Connecting to distributed system: locators=warsaw.vmware.com[26340] The following disk stores were backed up: DiskStore at hosta.vmware.com /home/dsmith/dir1 DiskStore at hostb.vmware.com /home/dsmith/dir2 Backup successful. If the operation does not succeed at backing up all known members, you see a message like this: Connecting to distributed system: locators=warsaw.vmware.com[26357] The following disk stores were backed up: DiskStore at hosta.vmware.com /home/dsmith/dir1 DiskStore at hostb.vmware.com /home/dsmith/dir2 The backup may be incomplete. The following disk stores are not online: DiskStore at hostc.vmware.com /home/dsmith/dir3 A member that fails to complete its backup is noted in this ending status message and leaves the file INCOMPLETE_BACKUP in its highest level backup directory. Offline members leave nothing, so you only have this message from the backup operation itself. Although offline members cannot back up their disk stores, a complete backup can be obtained if at least one copy of the data is available in a running member. What the Online Backup Saves For each member with persistent data, the backup includes: • Disk store files for all stores containing persistent tables. • Configuration files from the member startup (sqlfire.properties). These configuration files are not automatically restored, to avoid interfering with any more recent configurations. In particular, if these are extracted from a master jar file, copying the separate files into your working area could override the files in the jar. • A restore script, written for the member’s operating system, that copies the files back to their original locations. For example, in Windows, the file is restore.bat and in Linux, it is restore.sh. Restore an Online Backup The restore script included in the online backup copies files back to their original locations. You can do this manually if you wish. Prerequisites Your members are offline and the system is down. Procedure 1. Read the restore scripts to see where they will place the files and make sure the destination locations are ready. The restore scripts refuse to copy over files with the same names. 2. Run each restore script on the host where the backup originated. In Windows, the file is restore.bat and in Linux, it is restore.sh. The restore copies the files back to their original location. Backing Up and Restoring Offline Disk Stores With the system offline, you copy and restore your files using your file system commands. • Offline File Backup and Restore on page 101 • Offline Members: Manual Catch-Up to an Online Backup on page 102
101
Managing Your Data in vFabric SQLFire
Offline File Backup and Restore To back up your offline system: 1. Validate, and consider compacting your disk stores before backing them up. See Compacting Disk Store Log Files on page 97. 2. Copy all disk store files, and any other files you want to save, to your backup locations. To restore a backup of an offline system: 1. Make sure the system is either down or not using the directories you will use for the restored files. 2. Make sure your members are configured to use the directories where you put the files. 3. Reverse your backup file copy procedure, copying all the backed up files into the directories you want to use. 4. Start the system members. Offline Members: Manual Catch-Up to an Online Backup If you must have a member offline during an online backup, you can manually back up its disk stores. Do one of the following: • Keep the member’s backup and restore separated, doing offline manual backup and offline manual restore, if needed. • Bring this member’s files into the online backup framework manually and create a restore script by hand, from a copy of another member’s script: 1. 2. 3. 4. 5.
Duplicate the directory structure of a backed up member for this member. Rename directories as needed to reflect this member’s particular backup, including disk store names. Clear out all files but the restore script. Copy in this member’s files. Modify the restore script to work for this member.
Handling Missing Disk Stores Use the sqlf list-missing-disk-stores command to find disk stores with the latest persisted data. Use sqlf revoke-missing-disk-stores to stop waiting for the data when you cannot bring it online. • Why Disk Stores Go Missing on page 102 • List Missing Disk Stores on page 103 • Revoke Missing Disk Stores on page 103 Why Disk Stores Go Missing Disk stores usually go missing because their member fails to start. The member can fail to start for a number of reasons, including: • Disk store file corruption. You can check on this by validating the disk store. • Incorrect distributed system configuration for the member • Network partitioning • Drive failure Note: The disk store directories listed for missing disk stores may not be the directories you have currently configured for the member. The list is retrieved from the other running members—the ones who are reporting the missing member. They have information from the last time the missing disk store was online. If you move your files and change the member’s configuration, these directory locations will be stale.
102
vFabric SQLFire User's Guide
Using Disk Stores to Persist Data
List Missing Disk Stores The sqlf list-missing-disk-stores command lists all disk stores with most recent data that are being waited on by other members. For replicated tables, this command only lists missing members that are preventing other members from starting up. For partitioned tables, this command also lists any offline data stores, even when other data stores for the table are online, because their offline status may be causing PartitionOfflineExceptions in cache operations or preventing the system from satisfying redundancy. Example: sqlf list-missing-disk-stores Connecting to distributed system: mcast=/239.192.81.2:12348 DiskStore at straw.gemstone.com /export/straw3/users/jpearson/testGemFire/hostB/DS1 Note: Make sure this sqlf call can find a sqlfire.properties file to locate the distributed system. Or, specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number). Revoke Missing Disk Stores This section applies to disk stores for which both of the following are true: • Disk stores that have the most recent copy of data for one or more tables or table buckets. • Disk stores that are unrecoverable, such as when you have deleted them, or their files are corrupted or on a disk that has had a catastrophic failure. When you cannot bring the latest persisted copy online, use the revoke command to tell the other members to stop waiting for it. Once the store is revoked, the system finds the remaining most recent copy of data and uses that. Note: Once revoked, a disk store cannot be reintroduced into the system. Use the sqlf list-missing-disk-stores command to identify the disk store you need to revoke. The revoke command takes the host and directory in input, as listed by that command. Example: sqlf list-missing-disk-stores Connecting to distributed system: mcast=/239.192.81.2:12348 DiskStore at straw.gemstone.com /export/straw3/users/jpearson/testGemFire/hostB/DS1 sqlf revoke-missing-disk-store straw.gemstone.com /export/straw3/users/jpearson/testGemFire/hostB/DS1 Connecting to distributed system: mcast=/239.192.81.2:12348 revocation was successful ... Note: Make sure this sqlf call can find a sqlfire.properties file to locate the distributed system. Or, specify the multicast port or locator properties to connect to the cluster (for example, -mcast-port= port_number).
103
Chapter
15
Exporting and Importing Data with vFabric SQLFire You can use either sqlf commands or Apache DdlUtils to import data into SQLFire. SQLFire installs and uses DdlUtils for sqlf import and export functionality.
Using SQLF Commands to Export and Import Data The sqlf utility provides several commands that use the DdlUtils 1.1 API to export and import database schemas and table data. You can use these sqlf commands with SQLFire and other JDBC datasources. Note: As a best practice, use the SYS.CREATE_ALL_BUCKETS procedure to pre-allocate partitioned table buckets before you import data into a partitioned table. This helps to ensure that partitioned table data is evenly distributed throughout the distributed system, even if you import table data using concurrent processes. These sqlf commands are used to write existing database information to files: • sqlf write-schema-to-sql—Writes a database schema to a file as SQL DDL statements. • sqlf write-schema-to-xml—Writes a database schema to a schema XML file. • sqlf write-data-to-xml—Writes table data to a data XML file. • sqlf write-data-dtd-to-file—Writes a Document Type Definition (DTD) that describes the layout of a data XML file (created using sqlf write-data-to-xml). These sqlf commands are used to import database information from files: • sqlf write-schema-to-db—Creates a database schema from a schema XML file. • sqlf write-data-to-db—Inserts table data into a schema using one or more data XML files and schema XML files. When you migrate a third-party database schema to SQLFire, use sqlf write-schema-to-sql and then modify the SQL statements to include SQLFire-specific features such as table partitioning and replication. Then use an interactive sqlf session to execute the script in SQLFire. See run on page 470. When you migrate a schema from one SQLFire system to another, use sqlf write-schema-to-xml or use sqlf write-schema-to-sql with the -dump-from-db option to include SQLFire-specific extensions in the DDL commands. Note: Databases such as Oracle 11g contain system tables with data types that are incompatible with the DdlUtils 1.1 API. To export schemas or data from these databases, you must use the -schema-pattern option with the sqlf command to exclude schemas that contain incompatible data types. See sqlf write-schema-to-xml.
105
Managing Your Data in vFabric SQLFire
The sections that follow describe how to use the above sqlf commands to migrate a third-party database to SQLFire. Export, Alter, and Import a Database Schema Using SQLF Use sqlf commands to export the schema and data from a third-party database, and then import the schema and data to vFabric SQLFire. Procedure 1. To use the sqlf export commands with a third-party database, you require a JDBC driver and connection URL for the database. Use a Java client application such as SQuirreL SQL to verify that you can connect to the database. This procedure uses MySQL as an example datasource. The components necessary to establish a JDBC connection the example server are: • Driver JAR file: mysql-connector-java-5.1.18-bin.jar • Driver class: com.mysql.jdbc.Driver • Connection URL: jdbc:mysql://localhost:3306/sqlfdb?user=username&password=password To ensure that sqlf can access the JDBC driver class, add the JAR location to your CLASSPATH. For example, open a new command prompt and enter: export CLASSPATH=$CLASSPATH:/path/mysql-connector-java-5.1.18-bin.jar 2. Add the SQLFire /bin directory to your path if you have not already done so. For example: export PATH=$PATH:~/vFabric_SQLFire_11_bNNNNN/bin 3. Use sqlf to export the schema of the third-party data source to a schema SQL file in a format that is compatible with SQLFire. For example: sqlf write-schema-to-sql -file=mysql-schema.sql -to-database-type=sqlfire -driver-class=com.mysql.jdbc.Driver -url=jdbc:mysql://localhost:3306/sqlfdb?user=username&password=password
Note: Databases such as Oracle 11g contain system tables with data types that are incompatible with the DdlUtils 1.1 API. To export schemas or data from these databases, you must use the -schema-pattern option with the sqlf command to exclude schemas that contain incompatible data types. See sqlf write-schema-to-xml. 4. Edit the CREATE TABLE statements in the resulting schema SQL file to use SQLFire-specific clauses. For example use syntax to specify colocation for partitioning, persist tables, associate tables with gateways, and so forth. 5. After editing the SQL script file, use an interactive sqlf session to execute the script in SQLFire: sqlf connect client 'localhost:1527'; run 'mysql-schema.sql'; 6. Pre-allocate buckets for partitioned table data before you import import the data, and exit sqlf: call sys.create_all_buckets; exit; 7. To import the data from the third-party datasource, first use these sqlf commands to export both the data and schema to XML files: sqlf write-schema-to-xml -file=mysql-schema.xml -driver-class=com.mysql.jdbc.Driver 106
vFabric SQLFire User's Guide
Exporting and Importing Data with vFabric SQLFire
-url=jdbc:mysql://localhost:3306/sqlfdb?user=username&password=password sqlf write-data-to-xml -file=mysql-data.xml -driver-class=com.mysql.jdbc.Driver -url=jdbc:mysql://localhost:3306/sqlfdb?user=username&password=password 8. Use the sqlf write-data-to-db command and specify both the data XML file and the schema XML file to import the data to SQLFire: sqlf write-data-to-db -files=mysql-data.xml -schema-files=mysql-schema.xml -client-bind-address=localhost -client-port=1527 9. If necessary, manually add triggers, views, and Java stored procedures in the SQLFire database.
Using Apache DdlUtils to Import Data The SQLFire installation includes Apache DdlUtils 1.1, and you can use DdlUtils in combination with Apache Ant to import data into SQLFire. The information in these sections is intended for customers who are familiar with DdlUtils and want to use DdlUtils in place of the available sqlf commands. Overview of DdlUtils DdlUtils is a small, easy-to-use component for working with Database Definition (DDL) files. DdlUtils is an Apache project. Use it to: • Generate DDL scripts from an existing database by using Ant tasks or a Java API. • Export and import data to/from XML from an supported database to another supported database (including SQLFire). • Take XML files that contain the definition of a database schema, for example, tables and columns. These files can be fed into DdlUtils through its Ant task or programmatically in order to create the corresponding database or alter it so that it corresponds to the DDL. Likewise, DdlUtils can generate a DDL file for an existing database. Note: The instructions in this section use a version of DdlUtils that has been modified to support SQLFire and its SQL dialect. This version of DdlUtils is installed with SQLFire, and is available in the vFabric_SQLFire_11_bNNNNN/lib/ddlutils directory. Ensure that you use the installed version of DdlUtils, rather than the standard Apache version, when performing the instructions. A common usage pattern with SQLFire is as a database cache in front of a relational data store. DdlUtils provides a way to export the schema definition from a relational database as a SQL script so the developer can tailor it for partitioning and replication and then import the schema into SQLFire. DdlUtils can optionally be used to export from an external database and import the same into SQLFire. Note: If you attempt to export and import data that includes auto-generated primary key columns, the data row import fails. SQLFire adds distributed system data into generated keys, and this precludes referencing those generated keys by subsequent imported data rows. Export, Alter, and Import a Database Schema Using DdlUtils Use DdlUtils to export, modify, and import data. Procedure 1. Download and install Apache Ant if it is not already available on your system. 2. Ensure that your CLASSPATH includes the JAR file that contains the third-party database's JDBC driver. 107
Managing Your Data in vFabric SQLFire
3. Read the available DdlUtils documentation to understand the various Ant tasks. Documentation is installed in the ddlutils/DdlUtils-1.0-doc.zip file and is also available at the DdlUtils project site. This step is not required if you only want to export DDL from a schema and then apply it to SQLFire. 4. Make sure that you can connect to the external database and to SQLFire servers. You will need to supply the connection URL and properties for both databases when using DdlUtils. 5. Edit the build.xml file in the ddlutils/example directory to specify the JDBC connection information for both SQLFire and the database from which you are importing data. The example build file contains sample entries for SQLFire and MySQL. You will need to change each occurrence of the JDBC URLs. The build.xml contents are shown in the Example build.xml File on page 108. 6. By default, the usedelimitedsqlidentifiers property is set to false, which means that tables, columns, and other object names are simple references with no need to place the names in quotation marks. However, if your schema contains table or column names with embedded spaces, change this property to "true" and use single quotation marks around the table and column names each time you reference them. 7. Databases such as Oracle 11g contain system tables with data types that are incompatible with the DdlUtils 1.1 API. To export schemas or data from these databases, you must use the schemapattern property to exclude schemas that contain incompatible data types. 8. In the ddlutils/example directory, run the following command to run the three Ant tasks that generate a SQL DDL script that is compatible with SQLFire syntax: ant writeDDLToXML createDBFromXML writeDDLToSQL 9. Edit the CREATE TABLE statements in the resulting script to use SQLFire-specific clauses. For example use syntax to use colocation for partitioning, persist tables, associate tables with gateways, and so forth. 10. Use the sqlf utility to run the modified script in SQLFire. For example: sqlf connect client 'localhost:1527'; run 'db-schema1.sql'; exit; 11. To import data from the database, use the ImportDataToDB target in the example build.xml file. 12. If necessary, manually add triggers, views, and Java stored procedures in the SQLFire database. Example build.xml File View a sample build file for using DdlUtils with SQLFire. -->
109
Managing Your Data in vFabric SQLFire
Exporting and Importing Data from Text Files SQLFire provides several procedures to help you import and export delimited data in text files. You can use these procedures to transfer data between various programs and SQLFire. For example, you can use the procedures to import data as comma-separated values (a CSV file). Note: As a best practice, use the SYS.CREATE_ALL_BUCKETS procedure to pre-allocate partitioned table buckets before you import data into a partitioned table. This helps to ensure that partitioned table data is evenly distributed throughout the distributed system, even if you import table data using concurrent processes. These procedures are used to export and import data using delimited text files: • SYSCS_UTIL.EXPORT_QUERY—Writes the results of a SQLFire query to a text file using delimiters that you specify. • SYSCS_UTIL.EXPORT_TABLE—Writes the data from a SQLFire table to a text file using delimiters that you specify.
110
vFabric SQLFire User's Guide
Exporting and Importing Data with vFabric SQLFire
• SYSCS_UTIL.IMPORT_TABLE_EX—Writes data from a delimited text file to all columns of a SQLFire table. • SYSCS_UTIL.IMPORT_DATA_EX—Writes data from a delimited text file to a subset of columns in a SQLFire table. See Standard Built-in Procedures on page 609 for examples that use these procedures.
111
Chapter
16
Using Table Functions to Import Data as a SQLFire Tables A SQLFire table function lets you efficiently import foreign data into SQLFire tables. Table functions let you join SQLFire tables with any of the following data sources: • XML-formatted reports and logs • Queries that run in foreign databases • Streaming data from sensors • RSS feeds The data imported by a table function acts like a SQLFire replicated table that has no indexes. All data is fetched on every SQLFire member where a query against the table is executed. Outer joins that involve a partitioned table and a table function have limitations similar to joins with replicated tables (duplicate values are returned from the replicated table or table function). See CREATE FUNCTION on page 496 for the complete syntax needed to declare SQLFire table functions. The following topics provide information on how to write Java methods that wrap foreign data sources inside JDBC ResultSets.
Overview of SQLFire Table Functions A SQLFire table function is a Java method that returns a JDBC ResultSet. Most of the ResultSet methods can be written as stubs that simply raise exceptions. However, a SQLFire table function must implement the following ResultSet methods: • next() • close() • wasNull() • getXXX() - When invoking a SQLFire table function at runtime, SQLFire calls a getXXX() method on each referenced column. The particular getXXX() method is based on the column's data type as declared in the CREATE FUNCTION statement. Preferred getXXX() methods for SQLFire Table Functions on page 114 explains how SQLFire selects an appropriate getXXX() method. However, nothing prevents application code from calling other getXXX() methods on the ResultSet. The returned ResultSet needs to implement the getXXX() methods which SQLFire will call as well as all getXXX() methods which the application will call.
113
Managing Your Data in vFabric SQLFire
A SQLFire table function is materialized by a public static method that returns a ResultSet: public static ResultSet read() {...} The public static method is then bound to a SQLFire function name: CREATE FUNCTION externalEmployees () RETURNS TABLE ( employeeId INT, lastName VARCHAR( 50 ), firstName VARCHAR( 50 ), birthday DATE ) LANGUAGE JAVA PARAMETER STYLE DERBY_JDBC_RESULT_SET READS SQL DATA EXTERNAL NAME 'com.acme.hrSchema.EmployeeTable.read' To invoke a table function, wrap it in a TABLE constructor in the FROM list of a query. Note that the table alias (in this example "s") is a required part of the syntax: INSERT INTO employees SELECT s.* FROM TABLE (externalEmployees() ) s; Preferred getXXX() methods for SQLFire Table Functions While scanning a table function, SQLFire calls a preferred getXXX() method for each column, based on the column's data type. The following table lists the preferred getXXX() method for each SQLFire data type. Table 3: getXXX() Methods Called for Declared SQL Types
114
Column Type Declared by CREATE FUNCTION
getXXX() Method Called by SQLFire for JDBC 3.0 and 4.0
BIGINT
getLong()
BLOB
getBlob()
CHAR
getString()
CHAR FOR BIT DATA
getBytes()
CLOB
getClob()
DATE
getDate()
DECIMAL
getBigDecimal()
DOUBLE
getDouble()
DOUBLE PRECISION
getDouble()
FLOAT
getDouble()
INTEGER
getInt()
LONG VARCHAR
getString()
LONG VARCHAR FOR BIT DATA
getBytes()
vFabric SQLFire User's Guide
Using Table Functions to Import Data as a SQLFire Tables
Column Type Declared by CREATE FUNCTION
getXXX() Method Called by SQLFire for JDBC 3.0 and 4.0
NUMERIC
getBigDecimal()
REAL
getFloat()
SMALLINT
getShort()
TIME
getTime()
TIMESTAMP
getTimestamp()
VARCHAR
getString()
VARCHAR FOR BIT DATA
getBytes()
XML
Not supported
Example Table Function The following simple table function selects rows from a foreign database. package com.acme.hrSchema; import java.sql.*; /** * Sample Table Function for reading the employee table in an * external database. */ public class EmployeeTable { public static ResultSet read() throws SQLException { Connection conn = getConnection(); PreparedStatement ps = conn.prepareStatement( "select * from hrSchema.EmployeeTable" ); return ps.executeQuery(); } protected static Connection getConnection() throws SQLException { String EXTERNAL_DRIVER = "com.mysql.jdbc.Driver"; try { Class.forName( EXTERNAL_DRIVER ); } catch (ClassNotFoundException e) { throw new SQLException( "Could not find class " + EXTERNAL_DRIVER ); } Connection conn = DriverManager.getConnection ( "jdbc:mysql://localhost/hr?user=root&password=mysql-passwd" ); return conn; } }
115
Part
5
Developing Applications with SQLFire Developing Applications with vFabric SQLFire explains the main concepts of programming Java and ADO.NET applications with SQLFire APIs and the SQLFire implementation of SQL. It describes how to embed a SQLFire instance in a Java application and how to connect as a client. This guide also describes transactional and non-transactional behavior, and explains how to create data-aware stored procedures.
Topics: • • • • • • • • • • • • • •
Starting SQLFire Servers with the FabricServer Interface Developing Java Clients and Peers Configuring SQLFire as a JDBC Datasource Using SQLFire with Hibernate Storing and Loading JAR Files in SQLFire Developing ADO.NET Client Applications Using SQLFire.NET Designer Understanding the Data Consistency Model Using Distributed Transactions in Your Applications Using Data-Aware Stored Procedures Using the Procedure Provider API Using the Custom Result Processor API Programming User-Defined Types Using Result Sets and Cursors
117
Using SQLFire with Hibernate VMware provides a Hibernate dialect file that defines the SQLFire variant of the SQL language. You can use this file along with the SQLFire JDBC driver to configure SQLFire as a database for developing Hibernate projects. VMware provides a Hibernate dialect file that defines the SQLFire variant of the SQL language. You can use this file along with the SQLFire JDBC driver to configure SQLFire as a database for developing Hibernate projects. See vFabric SQLFire Hibernate Dialect at the VMware Community site for information about obtaining and using the SQLFire dialect file with your Hibernate projects.
118
vFabric SQLFire User's Guide
Chapter
17
Starting SQLFire Servers with the FabricServer Interface The FabricServer interface provides an easy way to start an embedded SQLFire server process in an existing Java application. You generally use the FabricServer interface when you want to provide thin client connectivity to the embedded SQLFire member. The FabricServer interface enables you to start multiple network services to listen for clients on different address and port combinations. Using the FabricServer interface also enables you to initialize resources in the SQLFire server member before you start network services and make the member available to client connections. Note: Although the FabricServer interface supports starting an embedded locator service, production systems should always use standalone locators. See Start and Stop Locators on page 229. Procedure To start a SQLFire server using the FabricServer interface: 1. Use the FabricServiceManager factory class to obtain the singleton instance of FabricServer. For example: FabricServer server = FabricServiceManager.getFabricServerInstance(); 2. Create a Java Properties object and add all of the boot property definitions that you want to configure when starting the server. For example: Properties bootProps = new Properties(); bootProps.setProperty("mcast-port", "12444"); As an alternative, you can define properties as system properties (pass with the -D option to the JVM), configure them in a properties file, or rely on default property values. Configuration Properties on page 327 provides more information. 3. Use the FabricServer.start() method with start the server using your Properties object: server.start(p); Note: SQLFire supports only a single FabricServer instance in a JVM at any given time. If you invoke start() multiple times using the same Properties, then no action is taken during subsequent calls. If you invoke start() multiple times using different properties, then by default the existing FabricServer instance is first stopped and then restarted with the new properties. You can optionally use the start(Properties bootProperties, boolean ignoreIfStarted) method with a "true" boolean value to reuse a previous instance instead of restarting it. See the FabricServer JavaDoc for more information. 4. To support client connections, start a network service on a unique client and port combination using the startNetworkServer() method. You can specify the host and port number as arguments to the method. You can 119
Developing Applications with SQLFire
specify additional network server properties in a Properties object that you pass with the method. For example, to specify only the address and port with no additional properties: server.startNetworkServer("localhost", 1528, null); Note: The SQLFire network server supports the Apache Derby network properties documented in the Derby Server and Administration Guide. 5. Start additional network services as needed to listen on different address and port combinations. Deciding When to Use Embedded Peer-to-Peer on page 220 provides more examples of using the FabricServer interface to embed SQLFire members.
Starting a Network Server Clients can connect to a NetworkInterface using the SQLFire JDBC client driver (URL of the form 'jdbc:sqlfire://:'). A network listener can be acquired by invoking the startNetworkServer method on a FabricServer instance. SQLFire uses the Distributed Relational Database Architecture (DRDA) protocol for client-server communication. Network Server Properties Specify network server properties when starting a network server on a SQLFire member. You must specify these properties as system properties: • When using the sqlf utility, specify network server properties at the command line when you boot a SQLFire server. • When using the embedded JDBC driver, specify all network server properties in the first JDBC connection to the SQLFire member. • When using the FabricServer API, you can start the FabricServer instance and NetworkListener independently of one another. For example you can start a FabricServer instance, populate tables with initial or default data, and then start a Network Listener at a later time to enable client connections. Include all Network Server properties in the Properties object when you execute the FabricService.startNetworkServer() method. The FabricServer API promotes properties in the configuration parameter to an equivalent system property if the application does not override them. Network server startup properties use the prefix 'sqlfire.drda.' The following properties are available: sqlfire.drda.host sqlfire.drda.keepAlive sqlfire.drda.logConnections sqlfire.drda.maxThreads sqlfire.drda.minThreads sqlfire.drda.portNumber sqlfire.drda.securityMechanism sqlfire.drda.sslMode sqlfire.drda.streamOutBufferSize sqlfire.drda.timeSlice sqlfire.drda.trace
120
vFabric SQLFire User's Guide
Starting SQLFire Servers with the FabricServer Interface
sqlfire.drda.traceAll sqlfire.drda.traceDirectory Example The following sample code shows how to start a Network Listener after specifying several properties. import com.vmware.sqlfire.*; // start a fabricserver if not running already. FabricServer fserver = FabricServiceManager.getFabricServerInstance(); if (fserver.status() != FabricService.State.RUNNING) { Properties props = new Properties(); props.setProperty("mcast-port", "23342"); props.setProperty("conserve-sockets", "false"); props.setProperty("log-level", "fine"); props.setProperty("host-data", "true"); fserver.start(props); } // commonly used network server properties Properties netprops = new Properties(); netprops.setProperty("sqlfire.drda.minThreads", "10"); netprops.setProperty("sqlfire.drda.maxThreads", "100"); netprops.setProperty("sqlfire.drda.keepAlive", "1000"); netprops.setProperty("sqlfire.drda.sslMode" , "off"); // Other possible values // are "basic" for encryption // only with no client // authentication, and // "peerAuthentication" // for encryption with // SSL clients. // now start a network server listening on port 4343. NetworkInterface netserver = fserver.startNetworkServer("localhost", 4343, netprops); System.out.println("started network server properties with \n" + netserver.getSysinfo());
121
Chapter
18
Developing Java Clients and Peers A Java application can use the JDBC thin client driver to connect to a SQLFire cluster and execute SQL statements. Or, a Java application can use the JDBC peer client driver to embed a SQLFire peer process and participate as a member of the cluster. Both drivers use the basic JDBC connection URL of jdbc:sqlfire: where: • jdbc: is the protocol. • sqlfire: is the subprotocol.
Connect to a SQLFire Server with the Thin Client JDBC Driver The thin client driver class is packaged in com.vmware.sqlfire.jdbc.ClientDriver. In addition to the basic JDBC Connection URL, you specify the host and port number of a SQLFire server or a locator to which the thin client driver will connect. For example: jdbc:sqlfire://myHostName:1527/ • Code Example on page 123 • Thin Client Failover on page 124 • Enabling Single-Hop Data Access on page 124 • Configuring TCP Keepalive Settings on page 125 • Thin Client Driver Limitations on page 125 Code Example This code sample shows a more complete example of connecting to a SQLFire server in a Java application: try { java.util.Properties p = new java.util.Properties(); // 1527 is the default port that a SQLFire server uses to listen for thin client connections Connection conn = DriverManager.getConnection("jdbc:sqlfire://myHostName:1527/"); // do something with the connection } catch (SQLException ex) { // handle any errors System.out.println("SQLException: " + ex.getMessage()); System.out.println("SQLState: " + ex.getSQLState());
123
Developing Applications with SQLFire
System.out.println("VendorError: " + ex.getErrorCode()); } Thin Client Failover When you use the thin client to connect to a SQLFire locator member (rather than directly to a SQLFire server), the thin client driver can provide automatic failover if the initial connection to the distributed system is lost. See sqlf locator on page 391. Note, however, that this assumes the initial connection to the specified SQLFire locator succeeds. To improve the chances of establishing an initial connection to a SQLFire system, you can optionally specify the address of a secondary locator in addition to the primary locator, using the secondary-locators connection property. For example: jdbc:sqlfire://myLocatorAddress:1527/;secondary-locators=mySecondaryLocatorAddress:1527 Enabling Single-Hop Data Access By default, using the thin-client driver provides either one-hop or two-hop access to the data for executing queries or DML statements. One-hop access is available if the client's SQL statements work against data that happens to reside on the SQLFire member to which the thin client has connected. All other cases result in one-hop or two-hop access to the data: the SQL statement is evaluated first on the SQLFire member to which the client is connected, and if necessary, that server routes the query to other members in the cluster that host the actual data for the statement. SQLFire provides the option to provide single-hop access to data for certain queries when using a thin client connection. To use this option, set the single-hop-enabled connection property to true when connecting with the thin client driver. For example: jdbc:sqlfire://myHostName:1527/;single-hop-enabled=true Or, from within the sqlf utility: connect client 'myHostName:1527/;single-hop-enabled=true' Note: Single-hop access for thin clients requires SQLFire data store members in the distributed system to run a network server for direct client access. Configure each data store with network server functionality even if you use a locator for member discovery. See Starting a Network Server on page 120. Single-hop access is provided only for prepared statements. When you enable single-hop access on a connection and then prepare a prepared statement, the local SQLFire server adds data distribution information to other SQLFire server members in the response of the prepare message. When the prepared statement is executed, the client uses the added parameter and information fetched from the connected server to determine the exact SQLFire server on which it can find the data locally; it then directs the execution to that server. The following types of queries are good candidates for single-hop access: • Queries that have a WHERE clause on the partitioning columns of a table. • Primary key-based SELECT statements where the primary key is also the partitioning key. • IN-based WHERE clauses on partitioning columns. If the client cannot determine the location of the data based on the WHERE clause, then it defaults to standard two-hop execution. Single-hop execution is performed only when the client can be absolutely certain of the location of data that is touched by the query. Internally, the client JVM maintains a pool of connections that is shared by all of the prepared statements that might execute statements with single-hop access. For each SQLFire server, the client maintains a queue of connections that can grow to a maximum number of connection specified by the sqlfire.client.single-hop-max-connections system property. After this maximum number of connections has been created for a particular server, further single-hop executions must wait for a connection to become available in the queue. If the number of connections created for a particular SQLFire server has not reached the maximum,
124
vFabric SQLFire User's Guide
Developing Java Clients and Peers
then the client creates a new connection on the fly, uses it, and then returns it back to the connection. The default value for sqlfire.client.single-hop-max-connections is 5 connections per server. If you are developing a client that requires more concurrent connections per server for single-hop access, increase the maximum number of connections per server using the sqlfire.client.single-hop-max-connections system property. Note: To avoid degrading the performance of the network server, use the smallest number of concurrent single-hop threads that satisfy performance requirements. Configuring TCP Keepalive Settings By default, SQLFire servers use a TCP keepalive probe to help determine when clients have gone offline. SQLFire thin clients can also use these keepalive settings to accurately determine when a server has gone offline. The relevant configuration properties are: • keepalive-count on page 342 • keepalive-idle on page 342 • keepalive-interval on page 342 To use these properties with a SQLFire thin client, include the jna.jar library in your CLASSPATH. Note: Windows platforms do not support per-socket configuration for keepalive-count. As an alternative, you can configure a system-wide keepalive-count value in some versions of Windows. See http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220%28v=vs.85%29.aspx. Windows Vista and later versions keep this value fixed at 10. Note: On Solaris platforms prior to r10, system-wide TCP keepalive settings must be changed to larger values (approximately 30 seconds) in order to detect server failures by clients and vice versa. See http://docs.oracle.com/cd/E19082-01/819-2724/fsvdg/index.html. This also applies to other non-Linux, non-Windows platforms. For example, see http://www-01.ibm.com/support/docview.wss?uid=swg21231084. Thin Client Driver Limitations When the default batching mode is enabled for transactions, SQLFire detects any conflicts in DML operations lazily. DML conflicts may be thrown by the system at some point later in the transaction (for example, even when executing queries or at commit time). You can configure SQLFire to immediately detect conflicts at operation time by setting the gemfire.tx-disable-batching system property to "true" on all data store members in the distributed system. Note: Enabling gemfire.tx-disable-batching can degrade performance significantly. Enable this option only after you have thoroughly tested the setting in your system and have determined that the performance tradeoff is necessary to provide immediate conflict detection with thin clients. If you use the thin client driver to perform an insert to table that has no primary key, an automatic retry of the insert due to a failover (available when connecting via a locator member) can result in duplicate rows being added to the table. The thin-client driver has the following limitations when the single-hop connection property is enabled: • Single-hop access is not provided when using transactions or WAN replication. Note: Do not enable single-hop access when using transactions or WAN configurations, as it can lead to unexpected behavior.
125
Developing Applications with SQLFire
• Single-hop access is only supported for partitioned tables where the partitioning column(s) are of the basic data types: integer, long, double, decimal, char, varchar, and real. If a table is partitioned on a column having any other data type (like date, time, timestamp, blob, clob, and so forth), then queries on that tables are not considered for single-hop execution. • Single hop execution is attempted only for prepared statements, and not for simple statements. • Single hop execution is attempted only for SELECT, UPDATE and DELETE statements, but not for INSERT statements. • SQLFire does not support single-hop access for queries that require data from multiple data stores to be merged together. Typically, queries having aggregates like MAX, MIN, AVG, ORDER BY, and so forth are executed as if single-hop were disabled. If an aggregate query can provide all results from a single data store, then SQLFire provides single-hop access for the query. • Tables that have an expression resolver are not considered for single-hop access.
Start a SQLFire Peer with the Peer Client JDBC Driver The peer client driver class is packaged in com.vmware.sqlfire.jdbc.EmbeddedDriver. In addition to the basic JDBC Connection URL, peer client driver connections require one or more boot and/or connection properties to configure the embedded SQLFire peer process for member discovery and other features. For example: jdbc:sqlfire:;mcast-port=33666;host-data=false The connection properties can be specified either in the connection URL or passed in the Properties parameter to the DriverManager.getConnection method. In the connection URL, you specify attributes as key=value pairs: [;attributes] preceded by and separated by semicolons. For more on these properties, see Configuration Properties on page 327. In this case, all peers, including peer clients and SQLFire servers, are part of the same distributed system, discovering each other using either locator(s) or multicast. SQL statements that you execute in the peer client have at most single-hop access to data in the distributed system. (The SQLFire JDBC thin-client driver also provides single-hop access to data for lightweight client applications.) Note: If you use peer clients in a SQLFire distributed system that also contains data stores, keep in mind that peer clients themselves cannot use disk stores or persist data. You must start data stores and locators before starting peer clients, because peer clients rely on data stores to persist data. See Peer Client Considerations for Persistent Data on page 92 for more information. This code sample shows a more complete example of starting a peer client in a Java application: try { java.util.Properties p = new java.util.Properties(); // Use the locator running on the local host with port 3340 for peer member discovery... Connection conn = DriverManager.getConnection("jdbc:sqlfire:;locators=localhost[3340];mcast-port=0;host-data=false"); // Alternatively, use multicast on port 33666 for peer member discovery... /* Connection conn = DriverManager.getConnection("jdbc:sqlfire:;mcast-port=33666;host-data=false"); */ // do something with the connection } catch (SQLException ex) { // handle any errors 126
vFabric SQLFire User's Guide
Developing Java Clients and Peers
System.out.println("SQLException: " + ex.getMessage()); System.out.println("SQLState: " + ex.getSQLState()); System.out.println("VendorError: " + ex.getErrorCode()); } Note: • Unlike Derby, SQLFire does not use a databaseName. Instead of a "database" the connection is to a distributed system. The distributed system is uniquely identified by either the mcast-port or the locators. See Configuration Properties on page 327. • The subprotocol in the URL sqlfire: ends with a colon (:) and the list of connection attributes starts with a semicolon (;). • Setting mcast-port to 0 without specifying locators starts a "loner" (single member) distributed system. See Configuration Properties on page 327. • The list of connection attributes is not parsed for correctness. If you pass an incorrect attribute, it is simply ignored. • Setting the host-data attribute to true (default) specifies that data should be hosted in this member. To avoid hosting data in a member, such as in a peer client, set host-date to false.
127
Chapter
19
Configuring SQLFire as a JDBC Datasource The SQLFire JDBC implementation enables you to use a distributed system as an embedded JDBC datasource in products such as WebLogic Server. Procedure Follow this general procedure when setting up SQLFire as a datasource in a third-party product: 1. For products such as WebLogic Server that provide datasource templates, select "Apache Derby" or "Other" as the database type if vFabric SQLFire is not explicitly supported. 2. Specify "sqlfire" as the database name. This represents a single SQLFire distributed system. (SQLFire does not contain multiple databases as in Apache Derby or other relational database systems.) 3. For the hostname and port, specify the hostname and port combination of a SQLFire locator or a SQLFire server. This is the same hostname and port combination you would use to connect as a client from the sqlf prompt. 4. For the database username and password, enter a valid username and password combination if you have enabled authentication in your system (using the -auth-provider property). If you have not configured authentication in SQLFire, specify "app" as both the username and password values, or any other temporary value. Note: SQLFire uses the username specified in the JDBC connection as the schema name when you do not provide the schema name for a database object. SQLFire uses "APP" as the default schema. If your system does not enable authentication, you can specify "APP" for both the username and password to maintain consistency with the default schema behavior. 5. For the driver class, specify: com.vmware.sqlfire.internal.jdbc.ClientDataSource 6. The JDBC URL that you specify must begin with jdbc:sqlfire://. Remove any template properties such as create=true if they are present in the URL or properties fields. In products such as WebLogic, you cannot specify JDBC connection properties for an embedded datasource as part of the JDBC URL. Instead, use the properties field and specify connectionAttributes using the format: connectionAttributes=attribute;attribute;... For example: connectionAttributes=mcast-address=239.192.81.1;mcast-port=10334 or connectionAttributes=locators=239.192.81.1;mcast-port=0 See also the Apache Derby documentation for EmbeddedDataSource.
129
Developing Applications with SQLFire
7. A process can connect to only one SQLFire distributed system at a time. If you want to connect to a different SQLFire distributed system, shut down the current embedded data source before re-connecting with a different datasource.
130
vFabric SQLFire User's Guide
Chapter
20
Storing and Loading JAR Files in SQLFire Application logic, which can be used by SQL functions and procedures, includes Java class files stored in a JAR file format. Storing application JAR files in SQLFire simplifies application deployment, because it reduces the potential for problems with a user's classpath. SQLFire automatically loads installed JAR file classes into the class loader so that you can use them in your SQLFire applications and procedures. The JAR classes are available to all members of the SQLFire distributed system, including those that join the system at a later time. Note: Many of the topics in this section were adapted from the Apache Derby documentation source files, and are subject to the Apache license: Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Class Loading Overview You store application classes, resources, and procedure implementations in SQLFire by installing one or more JAR files. After installing the JAR file, an application can then access the classes without having to be coded in a particular way. These are the basic steps for storing and using JAR files in SQLFire. Create JAR Files for Your Application When you create a JAR file for installation, include any Java classes that are intended for SQLFire class loading. This might include stored procedure or result processor implementations, as well as supporting classes. When you add classes to the JAR file, do not include: • The standard Java packages (java.*, javax.*)
131
Developing Applications with SQLFire
SQLFire does not prevent you from storing these standard JAR files, but the classes are never loaded from the JAR file. • Classes that are supplied with your Java environment (for example, sun.*) SQLFire can load classes from any number of JAR files from any number of schemas. Create JAR files intended for SQLFire class loading in the same way that you would create a JAR file for including in a user's classpath. For example: jar cf travelagent.jar travelagent/*.class Different IDEs have tools to generate a list of contents for a JAR file based on your application. If your application requires classes from other JAR files, you have the following options: • Extract the required third-party classes from their JAR files, and include only those classes in your JAR file. Use this option when you need only a small subset of the classes in the third-party JAR file. • Store the entire third-party JAR file in the database. Use this option when you need most or all of the classes in the third-party JAR file, because you can then upgrade your application and third-party logic independently of one another. • Deploy the third-party JAR file in the user's class path. Use this option when the necessary classes are already installed on a user's computer. Manage JAR Files in SQLFire When you install a JAR file, you specify a unique SQLFire JAR name, which is a SQL92Identifier. SQLFire installs the JAR file and automatically loads the JAR file classes into its class loader, and it becomes available to all members of the distributed system (including members that join the system at a later time). SQLFire provides sqlf commands to install, replace, and remove JAR files as described in these sections: • Installing a JAR File on page 134 • Manage JAR Files in SQLFire on page 132 • Removing an Installed JAR File on page 135 Installing a JAR File To install a JAR file to a SQLFire distributed, use the sqlf install-jar command. For example, the following command installs the tours.jar file to the APP schema: sqlf install-jar -name=APP.toursjar -file=c:\tours.jar -client-bind-address=locator_address -client-port=locator_port If the SQLFire distributed system uses multicast for discovery instead of a locator, specify the multicast properties instead, as in: sqlf install-jar -name=toursjar -file=c:\tours.jar -mcast-address=multicast_address -mcast-port=multicast_port The -name that you provide for the JAR must be a unique identifier. You must include a schema name to qualify the identifier. You can use the identifier with later calls to sqlf replace-jar or sqlf remove-jar. The -file option specifies the location of the JAR file to install using either a local path or a URL. After installing the JAR to the distributed system, SQLFire automatically loads the JAR file classes into its class loader; you do not need to explicitly load classes after installing the JAR. The JAR and its classes are available to all members of the SQLFire distributed system, including those that later join the cluster.
132
vFabric SQLFire User's Guide
Storing and Loading JAR Files in SQLFire
Note: After you install a JAR file, you cannot modify any of the individual classes or resources within the JAR file. Instead, you must replace the entire JAR file to update a class. Replacing a JAR File Use the sqlf replace-jar command to replace an installed JAR file with a new JAR. Specify the identifier of an existing JAR installation when using the command, as in: sqlf replace-jar -name=APP.toursjar -file=c:\tours2.jar -client-bind-address=locator_address -client-port=locator_port When you replace a JAR file, SQLFire loads the new classes right away without your having to reboot. It also recompiles objects that depend on the JAR file, such as installed listener implementations. Removing an Installed JAR File Specify the JAR identifier with the sqlf remove-jar command to remove an installed JAR file. For example, the following command removes the class files associated with the APP.toursjar JAR installation: sqlf remove-jar -name=APP.toursjar -client-bind-address=locator_address -client-port=locator_port Code Your Applications In your SQLFire applications, you can load the installed classes either by indirectly referencing them in the code or by directly using java.lang.Class.forName. You load resources the way you normally would, using the standard java.lang.Class.getResourceAsStream, a mechanism that allows an application to access resources defined in the classpath without knowing where or how they are stored. You do not need to make any changes to the way code interacts with SQLFire and its JDBC driver. An application can safely attempt to boot SQLFire, even though it is already running, without any errors. Applications connect to SQLFire in the usual manner. Note: The method getResource is not supported.
Alternate Methods for Managing JAR Files SQLFire also provides system procedures that you can use to interactively install and manage JAR files from a client connection. Keep in mind that the procedures have certain limitations compared to using sqlf commands to manage JAR files. This topic contains the following sections: • Restrictions for Managing JAR Files Using System Procedures on page 134 • Installing a JAR File on page 134 • Installing a JAR File Directly into SYS.JARS on page 134 • Replacing a JAR File on page 135 • Removing an Installed JAR File on page 135 See also JAR Installation Procedures on page 623 for more information about the procedures used to manage installed JAR files.
133
Developing Applications with SQLFire
Restrictions for Managing JAR Files Using System Procedures To use the SQLJ.INSTALL_JAR or SQLJ.REPLACE_JAR procedure, the JAR file path must be available on the specific SQLFire data store to which the client is connected. If the client connects directly to a known SQLFire server, then only that server requires the JAR file to be available at the specified path when the procedure executes. However, if the client connects using a locator then it may be connected to any available server in the distributed system. In this case, the JAR file path should be available to all members of the SQLFire cluster (for example, from a fileshare location such as z:\) to ensure that the procedure can execute. Note: If you cannot provide a common JAR location to ensure that SQLJ.INSTALL_JAR or SQLJ.REPLACE_JAR executes, use the sqlf commands described in Manage JAR Files in SQLFire on page 132 or manually inject the JAR file into the SQLFire system as described in Installing a JAR File Directly into SYS.JARS on page 134 . Installing a JAR File with SQLJ.INSTALL_JAR To install a JAR file to SQLFire, connect to a SQLFire member and execute the install_jar procedure. For example, the following procedure installs the tours.jar file to the APP schema, from a fileshare location: call sqlj.install_jar( 'z:\tours.jar', 'APP.Sample1', 0) Note: The final integer argument specifies an alias for the JAR file. However, SQLFire ignores this argument, so it is normally set to 0. The second argument defines an identifier for the JAR file in the SQLFire system. You must include a schema name to qualify the identifier. You can use the JAR identifier at a later time with the SQLJ.REPLACE_JAR and SQLJ.REMOVE_JAR procedures. You can optionally specify a quoted identifier for the SQLFire JAR name: call sqlj.install_jar( 'z:\tours.jar', 'APP."Sample2"', 0) After installing the JAR, SQLFire automatically loads the JAR file classes into its class loader; you do not need to explicitly load classes after installing the JAR. The JAR and its classes are available to all members of the SQLFire distributed system, including those that later join the cluster. Note: After you install a JAR file, you cannot modify any of the individual classes or resources within the JAR file. Instead, you must replace the entire JAR file to update a class. Installing a JAR File Directly into SYS.JARS If the JAR file to install is not available to every data store member in the SQLFire cluster, inject the JAR file directly from the client into the SQLFire system using Java code. The SYS.JARS table stores installed JAR files, and you can insert a new JAR into this table as a byte[] value. For example, an application can use code similar to the following to add a JAR to the SYS.JARS table: byte[] jarBytes = getJarBytes(myjar); String sql = "insert into SYS.JARS values(?, ?)"; PreparedStatement ps = conn.prepareStatement(sql); ps.setString(1, "app.sample1"); ps.setBytes(2, jarBytes); ps.executeUpdate();
134
vFabric SQLFire User's Guide
Storing and Loading JAR Files in SQLFire
In this example application, the getJarBytes() method would be implemented in the following way: public static byte[] getJarBytes(String jar) throws Exception { FileInputStream fis = new FileInputStream(jar); byte[] data = new byte[4096]; int len; ByteArrayOutputStream bos = new ByteArrayOutputStream(); while ((len = fis.read(data)) != -1) { bos.write(data, 0, len); } return bos.toByteArray(); } SQLFire automatically loads the installed JAR file class into its class loader; you do not need to explicitly load classes after installing the JAR. Replacing a JAR File with SQLJ.REPLACE_JAR Execute the SQLJ.REPLACE_JAR procedure on a SQLFire member to replace a JAR file. For example, the following procedure replaces the JAR contents of the loaded APP.Sample1 JAR installation with the contents of the newtours.jar file. call sqlj.replace_jar( 'c:\myjarfiles\newtours.jar', 'APP.Sample1'); When you replace a JAR file, SQLFire loads the new classes right away without your having to reboot. Removing an Installed JAR File with SQLJ.REMOVE_JAR Execute the SQLJ.REMOVE_JAR procedure on a SQLFire member to remove an installed JAR file. For example, the following command removes the class files associated with the APP.Sample1 JAR installation: call sqlj.remove_jar( 'APP.Sample1', 0); Note: The final integer argument specifies an alias for the JAR file. However, SQLFire ignores this argument, so it is normally set to 0.
135
Chapter
21
Developing ADO.NET Client Applications SQLFire provides a managed ADO.NET driver that you can use to develop non-Java client applications. The ADO.NET driver uses IKVM technology to translate SQLFire JDBC core libraries into .NET MSIL. The driver is supported with the Microsoft .NET 2.0 SP1 or higher framework. Note: You cannot register a custom .NET type as a user-defined type in SQLFire.
About the ADO.NET Driver The SQLFire ADO.NET driver is an implementation of Microsoft's ADO.NET interfaces and abstract classes. It provides connectivity to a SQLFire system as a network client driver. In this release of SQLFire, the ADO.NET driver does not support running a SQLFire member in an embedded mode (embedded driver) within an application. The driver provides fast access to a SQLFire cluster from .NET clients and enables advanced SQLFire features such as: • Transparent access to partitioned or replicated data in the cluster • High availability • Distributed queries and transactions You can access the driver through any .NET language including C#, C++/CLI, and Visual Basic .NET. As with the JDBC network driver, the ADO.NET driver complies with the DRDA standard protocol and utilizes several SQLFire extensions to provide enhanced features. The following sections provide more information about the ADO.NET driver: • High Availability on page 137 • Load Balancing on page 138 • Extensions for Batching and Row Locking on page 138 High Availability The ADO.NET driver maintains information about the servers available in a SQLFire cluster. If the server to which the client is connected goes down, then the driver automatically re-routes the connection to another server available in the cluster. To take full advantage of this feature, clients should connect using the address of a stand-alone SQLFire locator rather than a SQLFire server, because locators provide more accurate information about the available servers to which a client can fail over. If you specify the address of a server rather than a stand-alone locator, then it is possible that the client will fail to find any available server if those servers have been taken offline.
Note: The automatic failover feature assumes the client's initial connection to the specified SQLFire locator succeeds. To impr the chances of establishing an initial connection to a SQLFire system, you can optionally specify the address of a secondary loca 137
Developing Applications with SQLFire
in addition to the primary locator, using the secondary-locators connection property. For example, use a conne similar to "server=primary-locator-address:port;secondary-locators=secondary-locator-addre when you initiate a connection using the ADO.NET driver. If the application is not executing a transaction, then failover occurs transparently to the application. Failed operations are automatically re-executed if necessary. The application does not receive any exceptions as long as at least one server is available in the cluster to service network clients. However, when querying LOB data-types (BLOB/CLOB), clients may receive an SQLFException with State "XJ217" due to the LOB handle not being found on the new server. In this case, the client may need to retry the query explicitly (issue #43253 in vFabric SQLFire Limitations on page 765 provides more details). In this release of SQLFire, if the application is in the middle of a transaction and the server to which client is connected fails, then the current transaction fails with a connection exception (SQLFException with State "08006") and the application must re-execute the transaction. To handle such a case, applications can create a new connection, and then begin a new transaction and retry the transaction. Load Balancing When using the SQLFire locator or a locator embedded inside a server, servers inform the active locator of the load on each server using the number of current client connections on the server. This allows the locator to determine the best server to use for a new client connection. In this way, a network client that connects using a locator is always directed to the least-loaded server in the cluster. Extensions for Batching and Row Locking Similar to the JDBC API, the ADO.NET driver provides extensions to the DbCommand API to allow for batching without requiring a DbDataAdapter. Also, the DbDataReader implementation, SQLFDataReader, allows for expressing the intent to lock rows when executing a SELECT … FOR UPDATE statement. Additional extensions from the JDBC API are also provided, such as SQLFConnection.AutoCommit.
ADO.NET Driver Classes All ADO.NET driver implementation classes reside in the VMware.Data.SQLFire namespace.
138
SQLFire ADO.NET Class
Description
More Information
SQLFBatchUpdateException
Thrown when a batch operation fails. • VMware.Data.SQLFire.BatchUpdateException on page 743 Extends SQLFException.
SQLFClientConnection
Creates a connection to SQLFire with • Connecting to SQLFire with the ADO.NET Driver on page 140 a URL of the form "Server=:" . • Managing SQLFire Transactions on page 147 Provides several extension methods to • Performing Batch Updates on page 148 • VMWare.Data.SQLFire.SQLFClientConnection on page support SQLFire transactions. 743
SQLFCommand
Implements the .NET System.Data.Common.DbCommand class.
• Executing SQL Commands on page 141 • Working with Result Sets on page 141 • Specifying Command Parameters with SQLFParameter on page 144 • VMware.Data.SQLFire.SQLFCommand on page 748 • System.Data.Common.DbCommand (.NET)
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
SQLFire ADO.NET Class
Description
More Information
SQLFType
SQLFire types. Each type corresponds • java.sql.Types to a JDBC type. • VMware.Data.SQLFire.SQLFType on page 754
SQLFDataReader
• Working with Result Sets on page 141 Implements the .NET System.Data.Common.DbDataReader • VMware.Data.SQLFire.SQLFDataReader on page 757 class. • System.Data.Common.DbDataReader (.NET)
SQLFDataAdapter
• Storing a Table on page 142 Implements the .NET System.Data.Common.DbDataAdapter • Storing Multiple Tables on page 143 class. • Updating Row Data on page 145 SQLFDataAdapter uses batch • Adding Rows to a Table on page 146 • VMware.Data.SQLFire.SQLFDataAdapter on page 755 updates when the
UpdateBatchSize property is set to a value greater than one. SQLFCommandBuilder
• Updating Row Data on page 145 Implements the .NET System.Data.Common.DbCommandBuilder • Adding Rows to a Table on page 146 class. Use this class to generate • VMware.Data.SQLFire.SQLFCommandBuilder on page commands with SQLFDataAdapter. 754 • System.Data.Common.DbDataAdapter (.NET)
SQLFParameter
Implements the .NET System.Data.Common.DbParameter class.
SQLFParameterCollection
• Specifying Command Parameters with SQLFParameter Implements the .NET System.Data.Common.DbParameterCollection on page 144 class. • VMware.Data.SQLFire.SQLFParameterCollection on page 760 • System.Data.Common.DbParameterCollection (.NET)
SQLFTransaction
• Managing SQLFire Transactions on page 147 Implements the .NETSystem.Data.Common.DbTransaction • VMware.Data.SQLFire.SQLFTransaction on page 761 class. SQLFire currently supports only • System.Data.Common.DbTransaction (.NET) the READ_COMMITTED isolation level.
SQLFException
• VMware.Data.SQLFire.SQLFException on page 759 Extends the System.Data.Common.DbException • System.Data.Common.DBException (.NET) class. This exception is thrown for all • JDBC API Javadoc SQLFire exceptions. It exposes additional State and Severity properties that have the same meanings as in the JDBC API.
• Specifying Command Parameters with SQLFParameter on page 144 • VMware.Data.SQLFire.SQLFParameter on page 760 • System.Data.Common.DbParameter (.NET)
Installing and Using the ADO.NET driver Follow this procedure to install and use the SQLFire ADO.NET driver. 1. Add a reference to your project in the VMware.Data.SQLFire.dll. This DLL is installed in the vFabric_SQLFire_11_bNNNNN\adonet\lib directory.
139
Developing Applications with SQLFire
2. Reference the driver namespace in each source file where you want to use SQLFire components. For example, include this directive with all other references required in your application: using VMware.Data.SQLFire;
Connecting to SQLFire with the ADO.NET Driver Use the SQLFClientConnection class to create a new connection to a SQLFire cluster. You must specify a connection URL of the form "server=hostname:port". For example: string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { // Initialize and open connection connection = new SQLFClientConnection(connectionStr); connection.Open(); /// /// Execute SQLFire commands /// } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); } This creates a client connection to a SQLFire server running on the local machine and listening on the default port 1527. After you obtain a connection you can execute DDL and DML statements as necessary.
Managing Connections In this release of SQLFire, the VMWare.Data.SQLFire.SQLFClientConnection class and underlying driver do not maintain any implicit pool of connections. Creating a new client connection to a SQLFire cluster is a somewhat expensive operation, so use connection resources carefully. There are two possible approaches for managing connections: • Use a custom pool of connections. The pool can be maintained as a thread-safe bounded queue of available connections. When a module is done with the connection, it can return the connection to the queue. If many connections are lying idle in the pool for some time, those connections can be cleaned up. This approach requires some work to implement, but tuning a connection pool for application requirements generally provides the best performance and resource utilization. • Use a thread-local connection. This approach relies on having a connection in the thread local (ThreadStatic attribute), so that all modules can access the connection quickly without having to create it explicitly. The advantage of this approach is that applications do not need to pass the connection object to all of the modules that require access. Each module can obtain the ThreadStatic connection when required. Although this approach
140
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
is easier to implement, it is useful only in scenarios where an application has a fixed set of threads, or long-lived threads that operate against a SQLFire system.
Executing SQL Commands To execute a SQL command, create a SQLFCommand from the SQLFire connection and supply the text of a SQL command. For example: string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = "SELECT COUNT(*) FROM product"; connection.Open(); int prodCount = Convert.ToInt32(command.ExecuteScalar()); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Working with Result Sets SQLFDataReader helps you work with multiple rows that are returned from a SQL statement. The following example queries all rows from a sample table and stores the results in a SQLFDataReader object. The application then accesses and displays each row in the result set. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = "SELECT * FROM product"; connection.Open();
141
Developing Applications with SQLFire
SQLFDataReader reader = command.ExecuteReader(); StringBuilder row = new StringBuilder(); while (reader.Read()) { for (int i = 0; i < reader.FieldCount; i++) row.AppendFormat("{0}, ", reader.GetString(i)); Console.WriteLine(row.ToString()); } } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Storing a Table You can use SQLFDataAdapter to populate a System.Data.DataTable object with the results from a SQL command, or to refresh a DataTable object with new results. The following example fills a DataTable with rows retrieved from a table, and then uses the DataTable to display each row. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = "SELECT * FROM product"; connection.Open(); // Create adapter and populate the DataTable object SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("product"); adapter.Fill(table); // Parse the DataTable object by rows foreach (DataRow row in table.Rows) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.Table.Columns.Count; i++) sb.AppendFormat("{0}, ", (row[i].ToString())); Console.WriteLine(sb.ToString()); 142
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
} } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Storing Multiple Tables You can use SQLFDataAdapter with a System.Data.DataSet object to hold multiple result sets. The following example uses a dataset to store the full contents from three different tables. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; connection.Open(); // Create adapter and data set to hold multiple result sets SQLFDataAdapter adapter = command.CreateDataAdapter(); DataSet dataset = new DataSet("CustomerOrder"); // Retrieve all customer records command.CommandText = "SELECT * FROM customer"; adapter.Fill(dataset, "customer"); // Retrieve all order records command.CommandText = "SELECT * FROM orders"; adapter.Fill(dataset, "orders"); // Retrieve all orderdetail records command.CommandText = "SELECT * FROM orderdetail"; adapter.Fill(dataset, "orderdetail"); // Parse all tables and rows in the data set foreach (DataTable table in dataset.Tables) { foreach (DataRow row in table.Rows) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.Table.Columns.Count; i++) sb.AppendFormat("{0}, ", (row[i].ToString())); Console.WriteLine(sb.ToString());
143
Developing Applications with SQLFire
} } } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Specifying Command Parameters with SQLFParameter Use SQLFParameter and SQLFParameterCollection to define input parameters for a SQLFCommand object. You do not need to specify the size of variable-sized data or scale and precision for numeric data. The only case where size specification can be useful is when data must be truncated to the given size. In all other cases the driver or server determines the data size. This applies to both input as well as output parameters. For the latter case, the driver sets the correct size, precision, scale as appropriate. The following example adds two SQLFParameter objects to a SQLFCommand parameter collection, and then uses the command to create an SQLFDataAdapter. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { // Initialize and open connection connection = new SQLFClientConnection(connectionStr); connection.Open(); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = @"SELECT * FROM orders WHERE order_date = ? AND subtotal > ?"; // Insert order_date value and add to command’s Parameters collection SQLFParameter param1 = Command.CreateParameter(); param1.DbType = DbType.Date; param1.Value = DateTime.Today; Command.Parameters.Add(param1); // Insert subtotal value add to command’s Parameters collection SQLFParameter param2 = Command.CreateParameter(); param2.DbType = DbType.Decimal; param2.Value = 999.99; Command.Parameters.Add(param2); SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("orders"); adapter.Fill(table); } catch (Exception e) 144
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
{ /// /// Log or re-throw exception /// } finally { connection.Close(); } You can also specify a range of parameters as an array of objects, as shown in this example: string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { // Initialize and open connection connection = new SQLFClientConnection(connectionStr); connection.Open(); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = @"SELECT * FROM orders WHERE order-date = ? AND subtotal > ?"; // Create an object array containing parameters' value object[] parameters = new object[] { DateTime.Today, 999.99 }; // Add object array as parameters (range) command.Parameters.AddRange(parameters); SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("orders"); adapter.Fill(table); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Updating Row Data You can use SQLFCommandBuilder to update, delete, and insert rows that are stored in a SQLFDataAdapter object. The following example uses an SQLFDataAdapter object to store the contents of a table, and then update row data. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); 145
Developing Applications with SQLFire
SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = @"SELECT unit_cost, retail_price FROM product WHERE product_id="; connection.Open(); // Create adapter and populate the DataTable object SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("product"); adapter.Fill(table); // Generate update command SQLFCommandBuilder builder = new SQLFCommandBuilder(adapter); adapter.UpdateCommand = builder.GetUpdateCommand(); // Modify product pricing table.Rows[0]["unit_cost"] = 99.99; table.Rows[0]["retail_price"] = 199.99; // Update the underlying table adapter.Update(table); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Adding Rows to a Table You can use SQLFCommandBuilder to update, delete, and insert rows that are stored in an SQLFDataAdapter object. The following example uses a SQLFDataAdapter object to store the contents of a table, and then insert new rows. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; 146
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
command.CommandText = @"SELECT * FROM product"; connection.Open(); // Create adapter and populate the DataTable object SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("product"); adapter.Fill(table); // Generate update command SQLFCommandBuilder builder = new SQLFCommandBuilder(adapter); adapter.InsertCommand = builder.GetInsertCommand(); // Create new product row DataRow row = table.NewRow(); row[0] = ; row[1] = <...>; row[2] = <...>; ... // Update the underlying table adapter.Update(table); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Managing SQLFire Transactions SQLFClientConnection implements methods to help you delimit SQLFire transactions in your application code. You begin and commit transactions using the connection object itself. The following example uses an SQLFDataAdapter object to insert and update table data within as a single transaction. string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { // Open connection, disable auto-commit, and start transaction connection = new SQLFClientConnection(connectionStr); connection.AutoCommit = false; connection.BeginSQLFTransaction(IsolationLevel.ReadCommitted); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; connection.Open(); // Get product info 147
Developing Applications with SQLFire
command.CommandText = "SELECT * FROM product WHERE product_id=?"; SQLFDataAdapter adapter = command.CreateDataAdapter(); DataTable table = new DataTable("product"); adapter.Fill(table); // Create new order command.CommandText = "INSERT INTO orders VALUES(?, ?, ?, ?, ?)"; command.ExecuteNonQuery(); // Create new order detail command.CommandText = "INSERT INTO orderdetail VALUES(?, ?, ?, ?, ?)"; command.ExecuteNonQuery(); // Update product quantity command.CommandText = "UPDATE product SET quantity=? WHERE product_id=?"; command.ExecuteNonQuery(); // Commit transaction connection.Commit(); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Performing Batch Updates The SQLFCommand object provides AddBatch(), ExecuteBatch(), and ClearBatch() methods to explicitly perform batch updates. The result of ExecuteBatch() is the number of rows changed by each command in a batch. This example uses SQLFire batch operations to insert new table rows. int numRecords = 100; int batchSize = 10; // Limit number of statements per batch execution string sqlfHost = "localhost"; int sqlfPort = 1527; string connectionStr = string.Format(@"server={0}:{1}", sqlfHost, sqlfPort); SQLFClientConnection connection = null; try { connection = new SQLFClientConnection(connectionStr); connection.Open(); SQLFCommand command = connection.CreateCommand(); command.CommandType = CommandType.Text; command.CommandText = @"INSERT INTO orders(order_id, order_date, ship_date, customer_id, subtotal) VALUES(?, ?, ?, ?, ?)"; // Prepare batch statement 148
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
command.Prepare(); int stCount = 0; // batch statements count for (int i = 0; i < numRecords; i++) { command.Parameters[0] = ; command.Parameters[1] = ; command.Parameters[2] = ; command.Parameters[3] = ; command.Parameters[4] = ; // Add statement to command's batch command.AddBatch(); // Execute Batch statements when batch size is reached and reset if ((++stCount) == batchSize) { command.ExecuteBatch(); stCount = 0; } } // Execute the remaining statements in the batch command.ExecuteBatch(); } catch (Exception e) { /// /// Log or re-throw exception /// } finally { connection.Close(); }
Generic Coding with the SQLFire ADO.NET Driver This release of the ADO.NET driver does not support DbProviderFactory. However, all implementation classes extend the corresponding classes in the System.Data.Common namespace. This allows applications to use the base classes of that namespace in the code and resort to SQLFire-specific classes only for creating class objects, or when SQLFire-specific methods must be invoked. If an application creates a SQLFClientConnection explicitly, then it can continue to use the base DbConnection class as far as possible. The classes VMWare.Data.SQLFire.SQLFClientConnection, VMware.Data.SQLFire.SQLFDataAdapter and VMware.Data.SQLFire.SQLFCommandBuilder require explicit creation, while the remaining class objects can be obtained using the base classes of the System.Data.Common namespace. For example, you need not create the SQLFCommand object explicitly because the DbConnection.CreateCommand() method provides a base DbCommand object. (This object is actually a SQLFCommand when it is using a SQLFClientConnection object.) Applications should use the base classes from the System.Data.Common namespace when writing ADO.NET code that can be easily changed to use a different driver, or when multiple drivers are used from the same code base. The table shows the mapping of the SQLFire ADO.NET driver classes to their base classes in the System.Data.Common namespace that you should use for generic coding. Also listed are the interfaces in System.Data implemented by the class.
149
Developing Applications with SQLFire
Table 4: Mapping SQLFire driver classes to System.Data.Common base classes SQLFire ADO.NET Driver Class
ADO.NET base class (in System.Data.Common)
ADO.NET interfaces (in System.Data)
SQLFClientConnection
DbConnection
IDbConnection
SQLFCommand
DbCommand
IDbCommand
SQLFCommandBuilder
DbCommandBuilder
n/a
SQLFDataAdapter
DbDataAdapter
IDbDataAdapter
SQLFDataReader
DbDataReader
IDataReader
SQLFException
DbException
n/a
SQLFParameter
DbParameter
IDataParameter, IDbDataParameter
SQLFParameterCollection
DbParameterCollection
IDataParameterCollection
SQLFRowUpdatedEventArgs
RowUpdatedEventArgs
n/a
SQLFRowUpdatingEventArgs
RowUpdatingEventArgs
n/a
SQLFTransaction
DbTransaction
IDbTransaction
SQLFType
(use DbTypes as in Table 88: SQL types to SQLFType mapping on page 754)
See VMware.Data.SQLFire.SQLFType on page 754.
For example: // Open a new connection to the network server running on localhost:1527 string host = "localhost"; int port = 1527; string connectionStr = string.Format("server={0}:{1}", host, port); // use the SQLFire specific class for connection creation using (SQLFClientConnection conn = new SQLFClientConnection(connectionStr)) { conn.Open(); // create a table // using the base DbCommand class rather than SQLFire specific class DbCommand cmd = conn.CreateCommand(); cmd.CommandText = "create table t1 (id int primary key, addr varchar(20))"; cmd.ExecuteNonQuery(); // insert into the table using named parameters // using an abstracted method that can deal with difference in the // conventions of named parameters in different drivers cmd = conn.CreateCommand(); string idPrm = GetEscapedParameterName("ID"); string addrPrm = GetEscapedParameterName("ADDR"); cmd.CommandText = "insert into t1 values (" + idPrm + "," + addrPrm + ")"; cmd.Prepare(); // using the base DbParameter class DbParameter prm; for (int i = 0; i < 1000; i++) { // first the parameter for ID cmd.Parameters.Clear(); prm = cmd.CreateParameter(); prm.ParameterName = "ID"; prm.DbType = DbType.Int32; prm.Value = i; 150
vFabric SQLFire User's Guide
Developing ADO.NET Client Applications
cmd.Parameters.Add(prm); // next the parameter for ADDR prm = cmd.CreateParameter(); prm.ParameterName = "ADDR"; prm.DbType = DbType.String; prm.Value = "addr" + i; cmd.Parameters.Add(prm); cmd.ExecuteNonQuery(); } // drop the table cmd = conn.CreateCommand(); cmd.CommandText = "drop table t1"; cmd.ExecuteNonQuery(); conn.Close(); }
151
Chapter
22
Using SQLFire.NET Designer The SQLFire.NET Designer component provides Visual Studio 2008 design-time support for a SQLFire distributed system. After installing the Designer component you can add SQLFire databases to the Server Explorer list in Visual Studio 2008, design queries with the Query Designer, drag-and-drop tables onto a typed DataSet, and so forth. The Designer component also enables you to create and edit SQLFire schemas directly in Visual Studio 2008.
Installing SQLFire.NET Designer The vFabric SQLFire installation provides an installation program to install and register the SQLFire.NET component. Prerequisites In order to install or use SQLFire.NET Designer, you must install this software in addition to vFabric SQLFire: • Microsoft Visual Studio 2008 • Microsoft Visual Studio 2008 SP1 • Microsoft Visual Studio 2008 SDK Procedure To install the SQLFire.NET component: 1. Go to the \adonet\SQLFireDesigner subdirectory of your vFabric SQLFire installation. (For example, c:\vFabric_SQLFire_11_bNNNNN\adonet\SQLFireDesigner). 2. Double-click the sqlfire_setup.msi installer. 3. Follow the on-screen prompts to install SQLFire.NET Designer.
Connecting to a SQLFire Distributed System Use SQLFire.NET Designer to create, modify and delete connections to vFabric SQLFire distributed systems. Prerequisites Before you follow this procedure: • Install the SQLFire.NET Designer component. See Installing SQLFire.NET Designer on page 153. • Start at least one SQLFire member with client access. See Tutorials on page 43. Procedure To connect to a SQLFire system:
153
Developing Applications with SQLFire
1. Start Visual Studio 2008. 2. Select View > Server Explorer to open the Server Explorer window. 3. Right-click the Data Connections node and choose Add Connection.... to display the Add Connection dialog. 4. Click Change... to display the Change Data Source dialog. 5. In the Data provider menu, select .NET Framework Data Provider for SQLFire and click OK. 6. In the Server field, anter the hostname and port of the SQLFire server or locator to which you want to connect (for example, localhost:1527). 7. Click Test Connection to verify that you can connect to the SQLFire system. 8. Click OK to create the connection. A SQLFire node is added to the Data Connections node. The connection settings are saved for future use. You can use the saved connection to design or edit tables, columns, indexes, foreign keys, constraints, views, and triggers.
Editing Tables Use SQLFire.NET Designer to visually create and modify SQLFire tables. Prerequisites Before you follow this procedure: • Install the SQLFire.NET Designer component. See Installing SQLFire.NET Designer on page 153. • Create a connection to a SQLFire system in Visual Studio 2008. See Connecting to a SQLFire Distributed System on page 153. Procedure To design SQLFire tables with SQLFire.NET Designer: 1. 2. 3. 4. 5. 6.
Start Visual Studio 2008. Select View > Server Explorer to open the Server Explorer window. Right-click the Data Connections > SQLFire node. To create a new table, right-click the Tables folder and select Add New Table to open the Table Designer. To edit an existing table, right-click an available table name and select Design to open the Table Designer. The Table Designer view provides several features to help you edit SQLFire tables: • Column editor grid—Use the column editor grid at the top of the view to add, delete, or rearrange table columns. Enter column names and data types directly in the labeled cells. Choose Allow Nulls if the column should allow null values. Click the cell to the left of a column to select the entire column, or drag the cell to the left of the column up or down to rearrange columns. Right-click in the cell to the left of a column to access the column's context menu (for example, to delete a column or display the Properties window). • Properties window—Use this window to edit the name and schema of the table. • Table Designer menu—Use this menu to perform editing tasks such as adding and deleting columns, setting the primary key column, editing indexes and primary keys, editing foreign key relationships, or editing check constraints. The sections that follow describe these tasks in more detail.
7. Select File > Save Tablen or File > Save All to save your changes. 8. When creating a new table, SQLFire.NET Designer prompts you to enter a table name. Specify the schema and table name, separated by a period (for example, APP.Table1).
154
vFabric SQLFire User's Guide
Chapter
23
Understanding the Data Consistency Model All peers in a single distributed system are assumed to be colocated in the same data center and accessible with reliable bandwidth and low latencies. Replication of table data in the distributed system is always eager and synchronous in nature. You support synchronous replication by configuring bidirectional WAN gateway senders between two or more distributed systems.
Data Consistency Concepts Without a transaction (transaction isolation set to NONE), SQLFire ensures FIFO consistency for table updates. Writes performed by a single thread are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by other processes. When a table is partitioned across members of the distributed system, SQLFire uniformly distributes the data set across members that host the table so that no single member becomes a bottleneck for scalability. SQLFire ensures that a single member owns a particular row (identified by a primary key) at any given time. When an owning member fails, the ownership of the row is transferred to an alternate member in a consistent manner so that all peer servers have a consistent view of the new owner. It is the responsibility of the owning member to propagate row changes to configured replicas. All concurrent operations on the same row are serialized through the owning member before the operations are applied to replicas. All replicas see the row updates in the exact same order. Essentially, for partitioned tables SQLFire ensures that all concurrent modifications to a row are atomic and isolated from each other, and that the 'total ordering' is preserved across configured replicas. The operations are propagated in parallel from the owning member to all configured replicas. Each replica is responsible for processing the operation, and it responds with an acknowledgment (ACK). Only after receiving all ACKs from all replicas does the owning member return control to the caller. This ensures that all operations that are sequentially carried out by a single process are applied to all replicas in the same order. There are several other optimistic and eventually consistent replication schemes that use lazy replication techniques designed to conserve bandwidth, and increase throughput through batching and lazily forwarding messages. Conflicts are discovered after they happen and reaching agreement on the final contents incrementally. This class of systems favor availability of the system even in the presence of network partitions but compromises consistency on reads or make the reads very expensive by reading from each replica. SQLFire instead uses an eager replication model between peers by propagating to each replica in parallel and synchronously. This approach favors data availability and low latency for propagating data changes. By eagerly propagating to each of its replicas, it is possible for clients reading data to be load balanced to any of the replicas. It is assumed that network partitions are rare in practice and when they do occur within a clustered environment,
155
Developing Applications with SQLFire
the application ecosystem is typically dealing with many distributed processes and applications, most of which are not designed to cope with partitioning problems. By offering a very loosely coupled WAN replication scheme, SQLFire enables the entire client load to be shifted to an alternate "disaster recovery" site.
No Ordering Guarantee for DML in Separate Threads SQLFire preserves the order of DML statements applied to the distributed system (and queued to AsyncEventlisteners or remote WAN sites) only for a single thread of execution. Updates from multiple threads are preserved in first-in, first-out (FIFO) order. Otherwise, SQLFire provides no "total ordering" guarantees. Data inconsistency can occur if multiple threads concurrently update the same rows of a replicated table, or if threads concurrently update related rows from replicated tables in a parent-child relationship. Concurrently updating the same row in a replicated table can result in some replicas having different values from other replicas. Concurrently deleting a row in a parent table and inserting a row in a child table can result in orphaned rows. When DML operations are queued to an AsyncEventListener or remote WAN site, similar inconsistency problems can occur with concurrent table access. For example, if two separate threads concurrently update rows on a parent table and child table, respectively, the order in which SQLFire queues those updates to an AsyncEventListener or WAN gateway may not match the order in which the tables were updated in the main distributed system. This can cause a foreign key constraint violation in a backend database (for example, when using DBSynchronizer) or in a remote WAN system that does not occur when the tables are initially updated. These types of "out of order" updates do not occur when multiple threads concurrently update the same key of a partitioned table. However, an application should always use a transaction for any operation that updates multiple rows.
Updates on Any Row Are Atomic and Isolated When rows are updated members may receive a partial row. SQLFire always clones the existing row, applies the update (changes one or more fields) and then atomically replaces the row with the updated row. This ensures that all concurrent threads reading or writing that row are always guaranteed to be isolated from access to any partial row updates.
Atomicity for Bulk Updates SQLFire does not validate all constraints for all affected rows before applying a bulk update (a single DML statement that updates or inserts multiple rows). The design is optimized for applications where such violations are rare. A constraint violation exception that is thrown during a bulk update operation does not indicate which row of the bulk update caused a violation. Applications that receive such an exception cannot determine whether any rows in the bulk operation updated successfully. To address the possibility of constraint violations, an application should apply the bulk update within the scope of a transaction, which ensures that all rows are updated or rolled back as a unit. As an alternative, the application should select rows for updating based on primary keys, and apply updates one at a time.
156
vFabric SQLFire User's Guide
Chapter
24
Using Distributed Transactions in Your Applications A transaction is a set of one or more SQL statements that make up a logical unit of work that you can commit or roll back, and that will be recovered in the event of a system failure. SQLFire's unique design for distributed transactions allows for linear scaling without compromising atomicity, consistency, isolation, and durability (ACID) properties.
Overview of SQLFire Distributed Transactions All statements in a transaction are atomic. A transaction is associated with a single connection (and database) and cannot span connections. In addition to providing linear scaling, the SQLFire transaction design minimizes messaging requirements, so that short-lived transactions are efficient. • Main Features of the SQLFire Transaction Model on page 157 • How the Transaction Model Works on page 157 Main Features of the SQLFire Transaction Model The SQLFire transaction model uses these important features: • Each SQLFire member that participates in a transaction maintains its own transaction state. Queries on the database always see committed data, and they do not need to acquire any locks; as a result, reads and writes can occur in parallel in the READ_COMMITTED isolation level. • During transactional writes, SQLFire individually locks each copy of a row that is being updated on each member. This alleviates the need for a distributed lock manager, and it allows for greater scalability. Also, SQLFire uses special read locks for REPEATABLE_READ and foreign key checks to ensure that those rows do not change for the duration of a transaction. • SQLFire locks generally fail eagerly (fail-fast) with a conflict exception (SQLState: "X0Z02") if a lock cannot be obtained due to concurrent writes from other active transactions. An exception to this fail-fast behavior occurs when the SQLFire member that initiates the transaction also hosts data for the transaction. In this case, SQLFire batches the transaction on the local member for performance reasons, and conflicts may not be detected on other nodes until just before commit time when SQLFire flushes the batched data. SQLFire never batches operations for SELECT ... FOR UPDATE statements. How the Transaction Model Works When data is managed in partitioned tables, each row is implicitly owned by a single member for non-transactional operations. However, with distributed transactions, all copies of a row are treated as being equivalent, and updates are routed to all copies in parallel. This makes the transactional behavior for partitioned tables similar to the 157
Developing Applications with SQLFire
behavior for replicated tables. The transaction manager works closely with the SQLFire membership management system to make sure that, irrespective of failures or adding/removing members, changes to all rows are either applied to all available copies at commit time, or they are applied to none. Note: SQLFire does not support adding new members to a cluster for an ongoing transaction. If you add a new member to the cluster in the middle of a transaction and the new member is to store data involved in the transaction, SQLFire implicitly rolls back the transaction and throws a SQLException (SQLState: "X0Z05"). There is no centralized transaction coordinator in SQLFire. Instead, the member on which a transaction was started acts as the coordinator for the duration of the transaction. If the application updates one or more rows, the transaction coordinator determines which owning members are involved, and acquires local "write" locks on all of the copies of the rows. At commit time, all changes are applied to the local cache and any redundant copies. If another concurrent transaction attempts to change one of the rows, the local "write" acquisition fails for the row, and that transaction is automatically rolled back. In the case where there is no persistent table involved, there is no need to issue a two-phase commit to redundant members; in this case, commits are efficient, single-phase operations. Unlike traditional distributed databases, SQLFire does not use write-ahead logging for transaction recovery in case the commit fails during replication or redundant updates to one or more members. The most likely failure scenario is one where the member is unhealthy and gets forced out of the distributed system, guaranteeing the consistency of the data. When the failed member comes back online, it automatically recovers the replicated/redundant data set and establishes coherency with the other members. If all copies of some data go down before the commit is issued, then this condition is detected using the group membership system, and the transaction is rolled back automatically on all members. Supported Transaction Isolation Levels SQLFire supports several transaction isolation levels. It does not support the SERIALIZABLE isolation level, nested transactions, or savepoints. SQLFire supports these transaction isolation levels: • NONE. By default, connections in SQLFire do not engage in transactions, unlike in other databases (see Data Consistency Concepts on page 155. This corresponds to the JDBC TRANSACTION_NONE isolation level (or IsolationLevel.Chaos in ADO.NET, or the "SET ISOLATION RESET" SQL command). However, this default behavior does not mean that there is no isolation and that connections have access to uncommitted state from other in-process transactions. The default consistency model without transactions is described in Understanding the Data Consistency Model on page 155. • READ_UNCOMMITTED. SQLFire internally upgrades this isolation to READ_COMMITTED. • READ_COMMITTED. SQLFire ensures that ongoing transactional as well as non-transactional (isolation-level NONE) operations never read uncommitted (dirty) data. SQLFire accomplishes this by maintaining transactional changes in a separate transaction state that is applied to the actual data-store for the table only at commit time. • REPEATABLE_READ. SQLFire supports the REPEATABLE_READ isolation level according to the ANSI SQL standard. A transaction that reads the same row more than once always sees the same column values for the row. REPEATABLE_READ also guarantees that the underlying committed row in a table never changes after the first read in a transaction, until the transaction completes (for example, it commits or aborts). SQLFire applies read and write locks to copies of selected data to ensure repeatable reads for the duration of a transaction. SQLFire does not use range locks, and phantom reads are still possible with this isolation level. In addition, readers that use the REPEATABLE_READ isolation level are guaranteed to see distributed, atomic commits. This means that if there is a transaction that writes rows and commits over multiple SQLFire members, then readers either see all of the commit row values across all members of the distributed system (after the commit), or they will see all of before-committed row values across all members. Readers never see some committed rows on one member and before-committed row values on another node. To support this behavior, SQLFire uses a 2-phase commit protocol for all REPEATABLE_READ transactions that have pending writes. 158
vFabric SQLFire User's Guide
Using Distributed Transactions in Your Applications
SQLFire detects conflicts between two transactions that write on the same row either during the transactions or just before a commit for READ_COMMITTED and REPEATABLE_READ transactions. However, if a REPEATABLE_READ transaction writes on the same row that has been read by another transaction, then SQLFire always detects such a conflict before the writer commits (in the first phase of the commit). This enables the system to minimize conflicts where reader transactions are short in duration and the transactions complete before the writer starts its commit. Note: REPEATABLE_READ transactions that have only performed reads never receive a conflict. In particular, even if a transaction reads a row after it has already been marked for write by another transaction, it is the writer that sees the conflict at commit time if the reader transaction has not completed by then. For both the write-write and write-read cases, if a reader or writer attempts to lock a row while the commit of another writer transaction on the row is in progress, then the reader waits for the commit to complete. The commit is usually short in duration, so this behavior reduces conflicts and ensures that the wait is finite. SQLFire provides these system properties that you can use to alter the conflict detection behavior for READ_COMMITTED and REPEATABLE_READ transactions: gemfire.WRITE_LOCK_TIMEOUT, gemfire.READ_LOCK_TIMEOUT, and gemfire.LOCK_MAX_TIMEOUT. Note: You must set these system properties to the same value on each data store in your SQLFire distributed system. For more information, see: • SET ISOLATION on page 543 • java.sql.Connection Interface on page 365 • sqlf commands: autocommit on page 451, commit on page 453, and rollback on page 469. Transactions and DDL Statements SQLFire permits schema and data manipulation statements (DML) within a single transaction. If you create a table in one transaction, you can also insert data into it in that same transaction. A schema manipulation statement (DDL) is not automatically committed when it is performed, but participates in the transaction within which it is issued. Although the table itself becomes visible in the system immediately, it acquires exclusive locks on the system tables and the affected tables on all the members in the cluster, so that any DML operations in other transactions will block and wait for the table's locks. For example, if a new index is created on a table in a transaction, then all other transactions that refer to that table wait for the transaction to commit or roll back. Because of this behavior, as a best practice you should keep transactions that involve DDL statements short (preferably in a single transaction by itself). Handling Member Failures These events occur in response to the failure of a single member during a transaction. 1. If the coordinator fails before a commit is fired, then each of the cohorts aborts the ongoing transaction. 2. If a participating member fails before commit is fired, then it is simply ignored. If the copies/replicas go to zero for certain keys, then any subsequent update operations on those keys throws an exception as in the case of non-transactional updates. If a commit is fired in this state, then the whole transaction is aborted. 3. If the coordinator fails before completing the commit process (with or without sending the commit message to all cohorts), the surviving cohorts determine the outcome of the transaction. If all of the cohorts are in the PREPARED state and successfully apply changes to the cache without any unique constraint violations, the transaction is committed on all cohorts. Otherwise, if any member reports
159
Developing Applications with SQLFire
failure or the last copy the associated rows goes down during the PREPARED state, the transaction is rolled back on all cohorts. 4. If a participating member fails before acknowledging to the client, then the transaction continues on other members without any interruption. However, if that member contains the last copy of a table or bucket, then the transaction is rolled back. Note: In this release of SQLFire, a transaction fails if any of the cohorts depart abnormally.
Sequence of Events for a Distributed Transaction Here is a step-by-step description of events that occur before and during the transaction commit sequence. Before a transaction is committed, the following events occur: 1. When a transaction is started the transaction coordinator creates a globally unique ID (TXID) and creates a work space (TXState) to track operations. If the transaction is started from a thin client connection, the coordination happens on the server to which the client is connected. Note that transactions are started implicitly in JDBC: the end of one transaction implicitly starts a new one. 2. All updates in the scope of a transaction are immediately propagated to all replicas in parallel and are partially coordinated on each data store (cohort). Each member that owns the data involved in the transaction manages the state in a local TXState. When updates are received by a cohort, it attempts to obtain local write locks on the rows. A cohort fails immediately if the rows are already locked by another transaction. Failure to obtain locks results in the coordinator implicitly rolling back the entire transaction and releasing all the locks on the data hosts. 3. SQLFire only detects Write-Write conflicts. To prevent rows fetched in the transaction from being modified before transaction commit, SQLFire supports select for update where the selected rows are locked before the result set can be returned. 4. While the transaction is in progress the updates are maintained only in the TXState on the data stores, completely isolated from other concurrent connections. 5. Any constraint checks on the rows are immediately applied, and failures result in a constraint violation exception. Note: In this release of SQLFire, constraint violations also implicitly roll back the transaction. 6. Readers do not normally acquire any locks unless the rows being read are about to be committed. Transactional read operations are applied to the TXState first, before they are applied to the committed state. 7. When using the REPEATABLE_READ isolation level, all rows from selects are written first to the TXSTate before they are returned to the application. These events occur during the transaction commit sequence: 1. When a commit message is received by the coordinator, it dispatches a single commit message to all of the cohorts. Because the rows are already locked and constraints applied during the transaction, it is certain that the transaction will not fail due to conflicts. 2. Each cohort guarantees transaction atomicity by making sure that no concurrent transaction can see partially-committed state on the data store. Note that even though SQLFire guarantees that no concurrent access sees the partially-committed state of a transaction on any single cohort, the commit phase does not guarantee that to be the case across all of the cohorts that are involved in the transaction. 3. Each cohort applies the TXState changes to the tables in memory, and releases all the locks before acknowledging to the coordinator.
160
vFabric SQLFire User's Guide
Using Distributed Transactions in Your Applications
Note: Because the outcome of the transaction is assured at commit time, the coordinator does not wait for individual commit replies from the cohorts before returning the committed transaction to the initiating thread. If the same connection immediately initiates another operation on the same data, then the cohorts wait for pending replies from the previous transaction (as described in Step 3) before applying the change. Also, a commit or rollback operation takes place even if the initiating client process becomes unavailable while the commit or rollback is performed.
SQLFire Transaction Design SQLFire implements optimistic transactions. The transaction model is highly optimized for colocated data, where all of the rows updated by a transaction are owned by a single member. SQLFire avoids the use of a centralized distributed lock manager and the traditional 2-phase commit protocol. Transactional state is managed on each data store that is affected by the transaction, using only local locks. This allows the cluster to scale even when applications utilize transactions. When 2-phase commit is used, SQLFire performs second-phase commit actions in the background, but ensures that the connection that initiated the transaction sees only the committed results. You can change this default behavior using the sync-commits property. SQLFire uses an "eager lock, fail fast" algorithm that capitalizes on the fact that updates are reliably and synchronously propagated to all cohorts (mainly replicas). The main ideas behind this algorithm are summarized as follows: • Acquire eager locks. Each transactional write operation is synchronously propagated to each replica where a local transaction coordinator acquires a LOCAL write lock on the key. • Fail fast. If the write lock cannot be acquired, presumably due to a concurrent, conflicting transaction, then the write backs off and marks the transaction for rollback. The transaction outcome cannot be reversed. • Transaction state. All the changes in a transaction are maintained on every member affected by the transaction (every member that hosts a copy of a row changed in the transaction) in a transaction state. The changes are applied locally to the underlying table data only on commit. This allows readers to execute concurrently with a single writer without requiring any locks or blocking in the READ_COMMITTED isolation level. The focus for this design is on "optimistic transactions" and the design makes these important assumptions: • The typical transaction duration is short. • Conflicts between transactions are rare. If concurrent transactions tend to conflict, it is the application's responsibility to retry the failed transaction. Using this design provides the potential for linear scaling. Without centralized lock management, transaction throughput can easily scale with additional members. Transaction processing involves the data stores plus a coordinating peer. Thus if the concurrent transaction workload is uniformly spread across the data set, increasing the number of data stores also balances the workload and increases the aggregate transaction throughput. The design also removes the colocation restriction for the transactional working set, because transactions can involve any number of data hosts. Transaction performance is also increased, as compared to transactions that use a centralized lock manager.
Best Practices for Using Transactions For optimum results, take note of best practices for working with SQLFire transactions. • For high performance, mimimize the duration of transactions to avoid conflicts with other concurrent transactions. If atomicity for only single row updates is required, then completely avoid using transactions because SQLFire provides atomicity and isolation for single rows without transactions.
161
Developing Applications with SQLFire
• Unlike in traditional databases, SQLFire transactions can fail with a conflict exception on updates instead of on commit. This choice makes sense given that the outcome of the transaction has been determined to fail. • When using transactions, keep the transaction duration and the number of rows involved in the transaction as low as possible. SQLFire acquires locks eagerly, and long-lasting transactions increase the probability of conflicts and transaction failures. • To the extent possible, model your database so that most transactions operate on colocated data. When all transactional data is on a single member, then stricter isolation guarantees are provided. • If your application spawns multiple threads or connections to work on committed data, consider setting the sync-commits conenction property to "true." By default SQLFire performs second-phase commit actions in the background, but ensures that the connection that issued the transaction only sees completed results. However, other threads or connections may see different results until the second-phase commit actions complete. setting sync-commits=true ensures that the current thin client or peer client connection waits until all second-phase commit actions complete.
Transaction Functionality and Limitations Take note of transaction behavior and limitations in this release of SQLFire. In this release of SQLFire, the scope for transactional functionality is: • The result set that is obtained from executing a query should either be completely consumed, or the result set is explicitly closed. Otherwise, DDL operations wait until the ResultSet is garbage-collected. • Transactions for persistent tables are enabled by default, but the full range of fault tolerance is not yet implemented. It is assumed that at least one copy of a row is always available (redundant members are available) in the event of member failures. • SQL statements that implicitly place locks, such as select for update, are not supported outside of transactions (default isolation level). • The supported isolation levels are 'READ COMMITTED' and 'READ UNCOMMITTED' where both behave as 'READ COMMITTED.' Autocommit is OFF by default in SQLFire, unlike in other JDBC drivers. • Transactions always do "write-write" conflict detection at operation or commit time. Applications do not need to use select for update or explicit locking to get this behavior, as compared to other databases. (select for update is not supported outside of a transaction.) • Nested transactions and savepoints are not supported. • SQLFire does not restrict concurrent non-transactional clients from updating tables that may be involved in transactions. This is by design, to maintain very high performance when no transactions are in use. If an application uses transactions on a table, make sure the application consistently uses transactions when updating that table. • All DML on a single row is atomic in nature inside or outside of transactions. • There is a small window during a commit when the committed set is being applied to the underlying table and concurrent readers, which do not consult any transactional state, have visibility to the partially-committed state. The larger the transaction, the larger the window. Also, transaction state is maintained in a memory-based buffer. The shorter and smaller the transaction, the less likely the transaction manager will run short on memory.
162
vFabric SQLFire User's Guide
Chapter
25
Using Data-Aware Stored Procedures A stored procedure is an application function call or subroutine that is managed in the database server. Because multiple SQLFire members operate together in a distributed system, procedure execution in SQLFire can also be parallelized to run on multiple members, concurrently. A procedure that executes concurrently on multiple SQLFire members is called a data-aware procedure. Data-aware procedures use an extended CALL syntax with an ON clause to designate the SQLFire members on which the procedure executes. When you invoke a procedure, the SQLFire syntax provides the option to parallelize the procedure execution on: • All data stores in the SQLFire cluster • A subset of data stores (on members that belong to one or more server groups) • All data store members that host a table • All data store members that host a subset of data in a table SQLFire executes the user code in-process to where the data resides, which provides very low-latency access to colocated data. (This is in contrast to map-reduce job execution frameworks like Hadoop, where data has to be streamed from processes into a Task process.) Procedures often return one or more result sets. SQLFire streams a result set to one coordinating member that can perform the reduction step on the results (typically this involves aggregation, as in map-reduce). In SQLFire, the reduction step is carried out by a result processor. SQLFire provides a default result processor, and you can also develop your own result processor implementations to customize the reduction step. The sections that follow describe how to configure, invoke, and develop stored procedure and result processor implementations in SQLFire.
Configuring a Procedure You must configure a procedure implementation in SQLFire before you can invoke the procedure. Before you configure a procedure, ensure that the procedure implemention is available in the SQLFire classloader. See Storing and Loading JAR Files in SQLFire on page 131. The syntax for configuring a procedure in SQLFire is as follows. You designate a procedure as data-aware or data-independent when you invoke it. See Invoking a Procedure on page 165. CREATE PROCEDURE procedure-name ([ procedure-parameter [, procedure-parameter] * ]) LANGUAGE JAVA PARAMETER STYLE JAVA { NO SQL | CONTAINS SQL | READS SQL DATA | MODIFIES SQL DATA }
163
Developing Applications with SQLFire
[ [DYNAMIC] RESULT SETS integer] EXTERNAL NAME 'procedure_external_class.method' The procedure-name is a SQL identifier that you can use to call the procedure implementation in SQLFire. The procedure_external_name specifies the actual static class_name.method_name of the Java procedure implementation. Using the Procedure Provider API on page 169 provides more information about implementing procedures. One or more procedure-parameter entries use the syntax: [ { IN | OUT | INOUT } ] [parameter_name] DataType Each parameter entry should match a corresponding parameter in the procedure's Java implementation. SQLFire supports the data types described in Data Types on page 638, including user-defined types (see Programming User-Defined Types on page 177). The results of a procedure, if any, can be supplied as OUT parameters, INOUT parameters, or as dynamic result sets. A client retrieves OUT parameters using methods in java.sql.CallableStatement. The NO SQL, CONTAINS SQL, READS SQL DATA, and MODIFIES SQL DATA options are used to describe the type of SQL statements that the procedure uses. Choose one of the available options: • NO SQL indicates that the stored procedure does not execute any SQL statements. • CONTAINS SQL indicates that the procedure does not execute SQL statements that read nor modify SQL data. • READS SQL DATA indicates that the procedure does not execute SQL statements that modify SQL data, but may issue other statements (such as SELECT statements) to read data. • MODIFIES SQL DATA indicates that the procedure can execute any SQL statement except those that are specifically disallowed in stored procedures. SQLFire uses MODIFIES SQL DATA as the default. SQLFire throws an exception if a procedure attempts to execute SQL statements that conflict with the NO SQL, CONTAINS SQL, or MODIFIES SQL DATA setting. RESULT SETS indicates the estimated upper bound of returned result sets for the procedure. Access result sets by calling the getMoreResults() and getResultSet() statements in java.sql.Statement. EXTERNAL NAME specifies the fully qualified class name and method name of the procedure to create. Example CREATE PROCEDURE SALES.TOTAL_REVENUE(IN S_MONTH INTEGER, IN S_YEAR INTEGER, OUT TOTAL DECIMAL(10,2)) LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA EXTERNAL NAME 'com.sqlfire.funcs.Revenue.calculateRevenueByMonth'
Configuring a Custom Result Processor Data-aware procedures use a separate result processor to merge the procedure results from multiple SQLFire members. You can use the default SQLFire result processor, or implement and configure your own result processor implementation to customize the merge behavior. Using the Procedure Provider API on page 169 describes how to implement a custom result processor.
164
vFabric SQLFire User's Guide
Using Data-Aware Stored Procedures
Note: When working with custom result processors, you may find it helpful to create an alias for the result processor class name. You can use the alias in place of the full result processor class name when you invoke a data-aware procedure. To create an alias use the SQL statement: CREATE ALIAS processor_alias FOR 'processor_class_name' where processor_class_name is the full name of a class that implements com.vmware.sqlfire.ProcedureResultProcessor and has a default constructor.
Invoking a Procedure SQLFire uses an extended CALL syntax for invoking data-aware procedures. The SQLFire syntax for invoking a procedure is: CALL procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_class ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON { ALL | SERVER GROUPS (server_group_name [, server_group_name]* ) } } ] Use the optional ON and WHERE clauses to provide routing hints to the SQLFire engine, in order to prune the procedure execution to a subset of SQLFire members: • ON TABLE executes the procedure code on SQLFire members that host data for the table. With partitioned tables, you can also use the WHERE clause to further specify only those members host specific data values for the table. Note that both the ON TABLE and WHERE clauses are only used to prune procedure execution to specific members; the clauses do not limit the data results, and neither the ON TABLE nor the WHERE clause restriction is applied to queries within the procedure body. • ON ALL executes the procedure code on all SQLFire members, while ON SERVER GROUPS executes the procedure code on one or more named server groups. Note: Specifying ON TABLE, ON ALL, and ON SERVER groups also affects the scoping of nested queries within the procedure implementation. See Populating Result Sets on page 170 and Using the and Escape Syntax with Nested Queries on page 172 for more information. If you omit the ON clause, SQLFire executes the procedure as a data-independent procedure on the local, coordinating member. If you omit the WITH RESULT PROCESSOR clause when calling a data-aware procedure, then SQLFire uses a default result processor implementation. If the called procedure does not specify an OUT parameter or result set, then SQLFire calls the procedure asynchronously, without waiting for a reply. SQLFire logs any error that occurs during procedure execution. High Availability for SQLFire Procedures If an error occurs before a procedure has returned any rows to the client, then SQLFire automatically re-executes the procedure. Procedure and result processor implementations can use the isPossibleDuplicate() method to determine if SQLFire re-executed a procedure after a SQLFire member failed. This type of detection is necesary for certain implementations that perform write operations that can lead to duplicate entries on procedure re-execution.
165
Developing Applications with SQLFire
If an error occurs after rows have been returned to the client, then SQLFire throws an exception and does not attempt to retry the procedure. Default Result Processor If you omit the WITH RESULT PROCESSOR clause when you invoke a data-aware procedure, SQLFire uses a default result processor implementation. The default result processor for data-aware procedures performs unordered merges on the dynamic result sets from each member that executes the procedure code. The default processor presents the same number of ResultSets to the JDBC client that were declared in the CREATE PROCEDURE statement. The default processor handles OUT and INOUT parameters as follows: • If the type is a basic SQL type, the default processor returns only the first value returned by the procedure. • If the type is a JAVA_OBJECT type, the values from all the servers that run the procedure are concatenated together as an object array and provided to the client as the object value. The ordering of the array elements is significant because a given array position corresponds to the same SQLFire member for each of the parameters. For example, the first element of each array corresponds to the OUT parameters that were provided by the same SQLFire member.
Example JDBC Client This example shows how a JDBC client can call a data-aware procedure and use a CallableStatement to work with the procedure's result sets. package com.vmware.sqlfire.jdbc; import import import import import import import import
java.io.Serializable; java.sql.CallableStatement; java.sql.Connection; java.sql.DriverManager; java.sql.ResultSet; java.sql.SQLException; java.sql.Statement; java.sql.Types;
import com.vmware.sqlfire.procedure.ProcedureExecutionContext; public class MyClient { public static class ExampleObj implements Serializable { private static final long serialVersionUID = 1L; private int val; public void setValue(int val) { this. val = val; } public int getValue() { return this.val; } } public static void main(String[] args) { try { Connection cxn = DriverManager.getConnection("jdbc:sqlfire:"); Statement stmt = cxn.createStatement(); stmt.execute("create type ExampleObjType external name '" 166
vFabric SQLFire User's Guide
Using Data-Aware Stored Procedures
+ ExampleObj.class.getName() + "' language java"); stmt.execute("CREATE PROCEDURE myProc " + "(IN inParam1 VARCHAR(10), " + + + + +
" OUT outParam2 INTEGER, " " INOUT example ExampleObjType, OUT count INTEGER)" "LANGUAGE JAVA PARAMETER STYLE JAVA " + "READS SQL DATA " "DYNAMIC RESULT SETS 2 " + "EXTERNAL NAME '" ProcedureTest.class.getName() + ".myProc'");
stmt.execute("create table MyTable(x int not null, y int not null)"); stmt.execute("insert into MyTable values (1, 10), (2, 20), (3, 30), (4, 40)"); CallableStatement callableStmt = cxn .prepareCall("{CALL myProc('abc', ?, ?, ?) ON TABLE MyTable WHERE x BETWEEN 5 AND 10}"); callableStmt.registerOutParameter(1, Types.INTEGER); callableStmt.registerOutParameter(2, Types.JAVA_OBJECT); callableStmt.registerOutParameter(3, Types.INTEGER); callableStmt.setObject(2, new ExampleObj()); callableStmt.execute(); int outParam2 = callableStmt.getInt(1); ExampleObj example = (ExampleObj)callableStmt.getObject(2); assert example.getValue() == 100; assert outParam2 == 200; ResultSet thisResultSet; boolean moreResults = true; int cnt = 0; int rowCount = 0; do { thisResultSet = callableStmt.getResultSet(); int colCnt = thisResultSet.getMetaData().getColumnCount(); if (cnt == 0) { System.out.println("Result Set 1 starts"); while (thisResultSet.next()) { for (int i = 1; i < colCnt + 1; i++) { System.out.print(thisResultSet.getObject(i)); if (i == 1) { System.out.print(','); } } System.out.println(); rowCount++; } System.out.println("ResultSet 1 ends\n"); cnt++; } else { thisResultSet.next(); System.out.println("ResultSet 2 starts"); for (int i = 1; i < colCnt + 1; i++) { cnt = thisResultSet.getInt(1); System.out.print(cnt); System.out.println(); } 167
Developing Applications with SQLFire
System.out.println("ResultSet 2 ends"); } moreResults = callableStmt.getMoreResults(); } while (moreResults); assert rowCount == cnt; assert rowCount == 4; } catch (SQLException e) { e.printStackTrace(); } } public static void myProc(String inParam1, int[] outParam2, ExampleObj[] example, int[] count, ResultSet[] resultSet1, ResultSet[] resultSet2, ProcedureExecutionContext ctx) throws SQLException { Connection conn = ctx.getConnection(); ExampleObj obj = new ExampleObj(); obj.setValue(100); example[0] = obj; outParam2[0] = 200; Statement stmt = conn.createStatement(); stmt.execute("select * from mytable"); resultSet1[0] = stmt.getResultSet(); Statement stmt3 = conn.createStatement(); stmt3 .execute("select count(*) from mytable"); stmt3.getResultSet().next(); Integer cnt = stmt3.getResultSet().getInt(1); count[0] = cnt; Statement stmt2 = conn.createStatement(); stmt2.execute("select count(*) from mytable"); resultSet2[0] = stmt2.getResultSet(); } }
168
vFabric SQLFire User's Guide
Chapter
26
Using the Procedure Provider API SQLFire provides an API to help you develop data-aware procedures. A ProcedureExecutionContext object provides information about the table used to filter the procedure execution, colocated data, and other information about the context in which a procedure is executed. An OutgoingResultSet interface enables you to construct a result set by adding rows or columns to a List object. Use the information in this section to develop and compile your procedure, and then add the implementation to SQLFire using the instructions in Storing and Loading JAR Files in SQLFire on page 131. You can then configure and execute the procedure using the information in Using Data-Aware Stored Procedures on page 163.
Procedure Parameters When you configure a procedure using the CREATE PROCEDURE statement, SQLFire assembles the method parameters and passes them to the procedure implementation class using Java reflection. The different types of parameters are handled in the following ways: • IN and INOUT parameters are passed to the procedure implementation as single-element arrays. • DYNAMIC RESULT SETS specified in the CREATE PROCEDURE statement cause additional method arguments to be appended, one for each dynamic result set. Each argument is treated as a ResultSet[] type. SQLFire passes in each argument as a single-element array containing a null value. For example, if you specify DYNAMIC RESULT SETS 2, SQLFire appends two additional ResultSet[] arguments, each with a single null value. ResultSets are returned to the application through the CallableStatement, in the order that they are defined in the procedure body. See Populating Result Sets on page 170. • The procedure implementation class can optionally specify a ProcedureExecutionContext parameter as the last parameter. SQLFire then passes in the procedure context object, which the implementation can use to determine information about the execution context. Note: The CREATE PROCEDURE and CALL statements should not reference the procedure context object as a procedure parameter. For example, if the Java method signature of the procedure implementation is: package com.acme.MyProc; import java.sql.*; public class MyProc {public static void myMethod(String inParam1, Integer[] outParam2, Date[] inoutParam3, 169
Developing Applications with SQLFire
Widget[] inoutParam4, ResultSet[] resultSet1, ResultSet[] resultSet2) { ... } } You would configure the procedure in SQLFire using a statement similar to: CREATE PROCEDURE myProc (IN inParam1 VARCHAR(10), OUT outParam2 INTEGER, INOUT DATE inoutParam3, INOUT WidgetType inoutParam4) LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 2 EXTERNAL NAME 'com.acme.MyProc.myMethod' Note: The formal parameter names are used as an example, and the actual parameter names need not match. Note: A user-defined type (WidgetType) implementation would have to be created elsewhere in this example. This same CREATE PROCEDURE statement could be used even if the Java implementation included a ProcedureExecutionContext in the method signature, as in the following example. package com.acme.MyProc; import java.sql.*; public class MyProc {public static void myMethod(String inParam1, Integer[] outParam2, Date[] inoutParam3, Widget[] inoutParam4, ResultSet[] resultSet1, ResultSet[] resultSet2, ProcedureExecutionContext context) { ... } }
Populating OUT and INOUT Parameters Your procedure implementation returns OUT and INOUT parameters if you set their values in the single-element array. For example, the following code sets an OUT and an INOUT parameter: outParam2[0] = 42; inoutParam3[0] = new java.sql.Date(System.currentTimeMillis());
Populating Result Sets The SQLFire API provides different ways to construct result sets in a procedure implementation.
170
vFabric SQLFire User's Guide
Using the Procedure Provider API
A procedure must open and generate a ResultSet either from the default connection (jdbc:default:connection) or from the connection that is obtained from the ProcedureExecutionContext object. SQLFire ignores result sets that are generated in any other manner. If your implementation requires a result set from a thin client connection or from a connection to an external database, then create an OutgoingResultSet to populate the results from the connection. See Using Result Sets and Cursors on page 181 for more information about using result sets in SQLFire. SQLFire returns ResultSets to the application through the CallableStatement, in the order that they are defined in the procedure body. Your procedure implementation can return fewer ResultSets than are defined in the DYNAMIC RESULT SETS clause; construct only those ResultSets that you require. Create all PreparedStatement or other Statement objects directly in the body of the procedure method that requires those objects. Do not attempt to cache statement objects as static variables. In contrast to a JDBC client application method, a Java procedure method cannot hold onto a JDBC object after it completes. Also, do not close a statement that generates a ResultSet, because doing so closes the ResultSet itself. The SQLFire API provides two ways to help you construct a result sets in the procedure implementation: • Execute a Query to Populate a Result Set on page 171 • Construct a Result Set with OutgoingResultSet on page 171 Execute a Query to Populate a Result Set The ProcedureExecutionContext provides a getConnection() method that returns a nested JDBC connection. You use this connection (or the default connection) to populate one or more ResultSets with a nested query. Data for the nested query is not manifested until next() is called on the ResultSet. SQLFire calls next() on the ResultSets as necessary to stream the required rows. For example: Connection cxn = context.getConnection(); Statement stmt = cxn.createStatement(); resultSet1[0] = stmt.executeQuery("select * from Bar where foo > 42"); resultSet2[0] = stmt.executeQuery("select * from Bar where foo <= 42"); SQLFire creates the one-element ResultSet arrays that hold the returned ResultSets. Note: Do not close the connection or the statement that you use to create the result set, because doing so also closes the result set. Remember that a data-aware procedure might be invoked with a WHERE clause that restricts execution to one or more SQLFire members, and the procedure implementation itself might execute a nested query on the same table. By default, nested queries execute only on those SQLFire members that were scoped at procedure invocation. Using the and Escape Syntax with Nested Queries on page 172 describes how to override the default scope for nested queries when required by your procedure implementation. Construct a Result Set with OutgoingResultSet As an alternative, the procedure can obtain an empty OutgoingResultSet object from the ProcedureExecutionContext or default connection, and then invoke addColumn() for each column of the result set followed by addRow() for each row. You can skip the initial calls to addColumn() if you want to use default column names, such as "c1," "c2," and so forth. When you use this method to construct a result set, SQLFire can immediately stream results after a call to addRow(), even while the procedure implementation continues to add rows.
171
Developing Applications with SQLFire
For example: OutgoingResultSet rs1 = context.getOutgoingResultSet(1); rs1.addColumn("field1"); rs1.addColumn("field2"); for (int i = 0; i < 10; i++) { rs1.addRow(new Object[i, String.valueOf(i)]); } rs1.endResults(); Note: Do not close the connection or the statement that you use to create the result set, because doing so also closes the result set.
Using the and Escape Syntax with Nested Queries A query that is nested within a data-aware procedure may be executed on local member data, or it may be re-distributed to the SQLFire cluster and executed on multiple SQLFire members. The and escape syntax can be used to override the default behavior for nested queries. If a data-aware procedure is invoked without specifying a WHERE clause in the CALL statement, then by default SQLFire treats a nested query in the procedure implementation as a "global" query. This means that the nested query is distributed to the SQLFire cluster and accesses table data from all SQLFire members that host data for the table. Note that global queries can return duplicate results with a single procedure call, because each member that executes the procedure body invokes a separate global query. If a procedure is invoked using a WHERE clause, then by default SQLFire treats the nested query in the procedure implementation as a "local" query. This means that the query only accesses the local partitioned table data that is assigned to the procedure on the SQLFire member where the procedure executes. You can override this default behavior by specifying either the "" or "" escape syntax at the very beginning of a nested query string in your procedure implementation. For example, the following nested queries always operate with local table data, even you invoke the procedure without specifying a WHERE clause: Connection cxn = context.getConnection(); Statement stmt = cxn.createStatement(); resultSet1[0] = stmt.executeQuery(" select * from Bar where foo > 42"); resultSet2[0] = stmt.executeQuery(" select * from Bar where foo <= 42"); Note: When you use the ON TABLE clause to invoke a procedure, any queries within the procedure body target only the primary data for a partitioned table on the local node. For replicated tables, a local query accesses only a single replica. In both cases, this avoids returning duplicate values in the procedure. When you use ON ALL to invoke a procedure, queries execute only the local member for partitioned table data. However, queries against replicated tables return duplicate results.
172
vFabric SQLFire User's Guide
Chapter
27
Using the Custom Result Processor API Data-aware procedures use a separate result processor to merge the procedure results from multiple SQLFire members. You can use the default SQLFire result processor when basic concatenation of the results are required. Default Result Processor on page 166 describes how the default processor works. For more complex use cases, such as sorting, merging, or joining the results from different servers, you must implement your own result processor to customize the merge behavior. A custom processor can be used to modify a procedure's OUT parameters and result sets before emitting the results to a client. The sections that follow describe how to use the SQLFire result processor API to implement your own custom result processor. Sample code is provided for a result processor that performs comparison-based sorting (merge sort) of data-aware procedure results.
Implementing the ProcedureResultProcessor Interface A custom result processor must implement the SQLFire ProcedureResultProcessor interface. SQLFire calls the processor implementation after a client invokes a procedure using the WITH RESULT PROCESSOR clause and begins retrieving results. The basic result processor API operates in the following manner: 1. SQLFire calls the init() method of the implementation to provide the ProcedureProcessorContext object. The ProcedureProcessorContext provides important details about the procedure implementation and the procedure call that is returning results to the client. ProcedureProcessorContext describes the methods that you can use to obtain context information and obtain nested JDBC connections. 2. Your implementation uses the ProcedureProcessorContext object to obtain the OUT parameters and result sets that the procedure is returning to the client, so that it can process the results as necessary. OUT parameters and result sets are obtained using the getIncomingOutParameters() and getIncomingResultSets() methods, respectively. Both of these methods returns an IncomingResultSet, which you can use to obtain metadata about the result set itself and inspect individual rows of the result set. 3. SQLFire calls the getNextResultRow() and getOutParameters() methods in your implementation to provide modified results to the client. The getNextResultRow() method of your implementation is called each time the next row of a particular result set must be returned to the client. The getOutParameters() should return all of the procedure's OUT parameters to the client. 4. SQLFire calls your implementation's close() method when the associated statement is closed and the output processor is no longer required. Free resources and perform any final cleanup in this method. Procedure Result Processor Interfaces on page 480 provides a complete reference to the methods defined in ProcedureResultProcessor, ProcedureProcessorContext, and IncomingResultSet.
Example Result Processor: MergeSort This example result processor implementation uses a merge sort algorithm to return sorted results to a client. 173
Developing Applications with SQLFire
The MergeSort processor is designed for use with a single procedure implementation that returns only a single result set and no OUT parameters. The procedure implementation is shown in Procedure Implementation on page 174, and the procedure would be configured in SQLFire using a statement similar to the following. CREATE PROCEDURE MergeSort () LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME 'examples.MergeSortProcedure.mergeSort' Your own result processor implementations may need to work with the output from multiple procedures, and take into account the possibility of multiple result sets and OUT parameters accordingly. Procedure Implementation The example result processor supports this procedure implementation. The procedure uses a single result set and no OUT parameters. package examples; import com.vmware.sqlfire.*; import java.sql.*; public class MergeSortProcedure { static final String LOCAL = ""; public static void mergeSort(ResultSet[] outResults, ProcedureExecutionContext context) throws SQLException { String queryString = LOCAL + "SELECT * FROM " + context.getTableName(); Connection cxn = context.getConnection(); Statement stmt = cxn.createStatement(); ResultSet rs = stmt.executeQuery(queryString); outResults[0] = rs; // Do not close the connection since this would also // close the result set. } } Merge Sort Result Processor This result processor implementation sorts the results from the MergeSortProcedure, which returns a single result set. package examples; import com.vmware.sqlfire.*; import java.sql.*; import java.util.*; public class MergeSortProcessor implements ProcedureResultProcessor { private ProcedureProcessorContext context; public void init(ProcedureProcessorContext context) { this.context = context;
174
vFabric SQLFire User's Guide
Using the Custom Result Processor API
} public Object[] getOutParameters() { throw new AssertionError("this procedure has no out parameters"); } public Object[] getNextResultRow(int resultSetNumber) throws InterruptedException { // this procedure deals with only result set number 1 assert resultSetNumber == 1; IncomingResultSet[] inSets = context.getIncomingResultSets(1); Object[] lesserRow = null; Comparator cmp = getComparator(); IncomingResultSet setWithLeastRow = null; for (IncomingResultSet inSet : inSets) { Object[] nextRow = inSet.waitPeekRow(); // blocks until row is available if (nextRow == IncomingResultSet.END_OF_RESULTS) { // no more rows in this incoming results continue; } // find the least row so far if (lesserRow == null || cmp.compare(nextRow, lesserRow) <= 0) { lesserRow = nextRow; setWithLeastRow = inSet; } } if (setWithLeastRow != null) { // consume the lesserRow by removing lesserRow from the incoming result set Object[] takeRow = setWithLeastRow.takeRow(); assert takeRow == lesserRow; } // if lesserRow is null, then there are no more rows in any incoming results return lesserRow; } public boolean getMoreResults(int nextResultSetNumber) { return false; // only one result set } public void close() { this.context = null; } /** Return an appropriate Comparator for sorting the rows */ private Comparator