Transcript
BSA Maintenance / DB Cleanup Best Practice 1 INTERNATIONAL TOLL FREE: Participant Code: 703371 Argentina: 0800 444 6440 Australia: 1 800 612 415 Austria: 0800 295 780 Bahamas: 1 800 389 0491 Belgium: 0 800 75 636 Brazil: 0800 891 0266 Bulgaria: 00 800 115 1141 Chile: 123 0020 6707 China, Northern Region: 10 800 714 1509 China, Southern Region: 10 800 140 1376 Colombia: 01 800 518 1171 Czech Republic: 800 700 715 Denmark: 80 883 277 Dominican Republic: 1 888 752 0002 France: 0 800 914 176 Germany: 0 800 183 0299
Hungary: 06 800 112 82 India: 000 800 1007 613 Indonesia: 001 803 017 6440 Ireland: 1 800 947 415 Israel: 1 80 925 6440 Italy: 800 789 377 Japan: 00348 0040 1009 Latvia: 8000 3523 Lithuania: 8 800 3 09 64 Luxembourg: 800 2 3214 Malaysia: 1 800 814 723 Mexico: 001 800 514 6440 Monaco: 800 39 593 Netherlands: 0 800 022 1465 New Zealand: 0 800 451 520 Norway: 800 138 41 Panama: 00 800 226 6440 Peru: 0800 54 129
Philippines: 1 800 111 010 55 Poland: 00 800 112 41 42 Portugal: 800 827 538 Russian Federation: 810 800 2915 1012 Singapore: 800 101 2320 Slovenia: 0 800 80439 South Africa: 0 800 982 304 South Korea, Korea, Republic Of: 003 0813 2344 Spain: 900 937 665 Sweden: 02 079 3266 Switzerland: 0 800 894 821 Taiwan: 00 801 127 186 Thailand: 001 800 156 205 2068 Trinidad and Tobago: 1 800 205 6440 United Kingdom: 0 808 101 7156 Uruguay: 0004 019 0348 Venezuela: 0 800 100 8540
Greece: 00 800 161 2205 6440 Hong Kong: 800 968 066 © Copyright 12/5/2012 BMC Software, Inc
1
Housekeeping Please ask questions in the “Q&A” section, not in Chat: -
Many “Q&A” questions can be addressed during the session by our experts, while Chat is not seen by the Presenter until the end of the session
https://communities.bmc.com/communities/docs/DOC-21692
© Copyright 12/5/2012 BMC Software, Inc
2
BMC Server Automation (BladeLogic) v8.2 Best Practices Maintenance & DB Cleanup 1 Sean Berry Lead, Customer Engineering Operations
Overview First Level Training Best Practice vs. How To Covers Most Common Maintenance Tasks Does not address every scenario Assumes prior knowledge of BSA components and terms
© Copyright 12/5/2012 BMC Software, Inc
4
Agenda Activities & Objects Maintenance Overview Assessment (Give it to me straight) Performance Monitoring – Basic Cleanup – Database Cleanup – Fileserver & Appserver Agent Health & Agent Cleanup Upgrades / Upkeep Configuration Guidance Questions & Feedback © Copyright 12/5/2012 BMC Software, Inc
5
Introduction Artifacts in the “Best Practices” franchise -
BSA Best Practices Webinar Series:
-
BSA Best Practices Webinar Episode 1: Deployment Architecture:
-
-
https://docs.bmc.com/docs/display/bsa82/Large-scale+installations
App Server sizing spreadsheet (internal) BSA Database Cleanup Best Practice White Paper (internal)
-
https://docs.bmc.com/docs/display/bsa82/High+availability+and+disaster+recovery
Large Scale Installations:
-
https://docs.bmc.com/docs/display/bsa82/Sizing+and+scalability+factors
Disaster Recovery and High Availability:
-
https://docs.bmc.com/docs/display/bsa82/Deployment+architecture
Sizing and Scalability:
-
https://docs.bmc.com/docs/display/bsa82/Home
Deployment Architecture:
-
https://communities.bmc.com/communities/docs/DOC-21693
BSA 8.2 base documentation:
-
https://communities.bmc.com/communities/docs/DOC-21692
https://docs.bmc.com/docs/display/NP/BSA+Database+Cleanup
Agent Cleanup blcli “Delete cleanup*” spaces
© Copyright 12/5/2012 BMC Software, Inc
6
Activities & Objects
Activities & Objects – Typical Usage Historical data is the data related to jobs run logs, results and schedules, which are stored in job results, job events, compliance results, audit results, snapshot results and job schedule tables. Why Do Maintenance - Same as any piece of machinery or other system: for smooth running - Conserve DB & FS Capacity - Ensure Performance - Maintain Capacity & Capability - Purge records once no longer needed / desired - Meet data storage compliance requirements (PCI, GLB, SOX)
© Copyright 12/5/2012 BMC Software, Inc
8
Activities & Objects – Typical Usage (cont’d) •
All activities use storage of some sort: • All jobs generate some database records • Normal Job Logs & Job Data • A large or verbose NSH Script or Deploy Job can easily generate 10,000 rows of Job Run Event Logs • Snapshot Job: 1 run * 1000 targets * 1000 objects = 1 000 000 rows • File Server • Software package of SQL Server 2008 full install is 2GB, many agents have footprints >50-100MB • Agent • Packages staged for deployment • Activity, Agent, other logs • App Server • Temporary files
© Copyright 12/5/2012 BMC Software, Inc
9
Concepts & Terminology Soft delete: - Object’s “is_deleted” column set to “1” Hard delete: - Object (and dependencies) are permanently removed from the database. Retention Policy - How long to keep a given object before cleaning it up. Some jobs, runs, or objects aren’t needed for more than a few days, some need to stay around for at least 90 days (Patch Catalog Job runs) depending on use case. A typical “run” object’s “weight” or “balance”: - If a given object has many rows of data (10,000 or more) associated with it, we say it’s “heavy”, since it will take a relatively long time to clean up the many rows required to clear that one object.
© Copyright 12/5/2012 BMC Software, Inc
10
Concepts & Terminology (cont’d) Leaf object - An object that has no dependencies, nothing depends on it (like a log file entry) Dependent object - A given run of a job is only relevant if the original job object is present, so the run should be cleaned up when the job itself is cleaned up. Stored Procedure - A particular type of SQL query that is kept within the database, and may perform relatively quickly. Truncate Cleanup - Quickly removes all objects older than a certain date. Only practical on leaf objects. UNDO space & REDO logs - Very long running queries can queue up other transactions, put strain on some critical DB resources
© Copyright 12/5/2012 BMC Software, Inc
11
Concepts & Terminology (cont’d) Historical Data - Job Run Logs, Job Results (Snapshot, Audit, Compliance), prior Job Schedules and Object Audit Trail objects. These types of data typically consume the bulk of the space in the database and are responsible for the bulk of the database growth on a daily basis, with the Job Run Event Log data (JobRunEvent) and the Audit Trail data (AuditTrail) typically consuming the most space. Shared Object Data - This data is typically related to Snapshot and Audit data, and can consume a large amount of database space, especially if Inventory Snapshots (or any other environment-wide Snapshot Jobs or Audit Jobs) are run on a frequent basis. Soft deleted Data - Any time an object is deleted in the GUI, or deleted by the retention policy, this data must eventually be hard-deleted from the database by the cleanupDatabase command File Server Data - After objects are removed from the database, the actual underlying file system objects need to be removed from the file server. Note that the cleanupDatabase step must complete before the cleanupFileServer is run. © Copyright 12/5/2012 BMC Software, Inc
12
Maintenance Overview
Kinds of Maintenance Health / Performance Monitoring (find the problems before they’re out of control) Normal day-to-day deletes & dependency checking Setting Boundaries (Retention & Job Timeouts) Database cleanup - Applying retention policies - Historical Cleanups - Hard Deletes Filesystem-based cleanup - Agent - Appserver - Fileserver
© Copyright 12/5/2012 BMC Software, Inc
14
Assessment: (How Much Work Do I Have Ahead of Me?)
Assessment: How much work do we have ahead? DB Cleanup: - When was Gather_stats last run? Was it the “right” gather_stats process? - Table sizes script: how big are my tables? (in millions of rows) - When’s the last time we ran cleanup? (Last Cleanup Task Ran) - Several scripts available for this task (see DB Cleanup White Paper) FS Cleanup - How big is my total file server footprint? (du –sk) - How many blpackages do I have? (ls -d blpackages/* | wc -l ) - Am I creating blpackages automatically and not cleaning them up? - How much of what I have do I not need anymore? Appserver Cleanup - How much space is being used on our appservers in the tmp folders? (not /tmp) - Is space a problem on the appserver? (df -k) © Copyright 12/5/2012 BMC Software, Inc
16
How much work do we have ahead? (cont’d) Agent Cleanup - What’s our agent footprint? (du –sk in the agent directory) - Do we have lots of old deploy packages on our agents? (spot check) - Are we tight on space on our agents, or do we see new deployments coming that will need more or much more space? (big installers) Stability (some things to think about) - How stable is my environment? Do I need to restart appservers more than weekly? Where should I be focusing my energy? Am I adhering to best practices for component placement? Do I have unresolved tickets I could close or respond to?
© Copyright 12/5/2012 BMC Software, Inc
17
Performance Monitoring - Basic
Performance Monitoring – Basic Availability: - How long “should” a given job type take to run in your environment? - Would you notice if a job or config server instance were down or unresponsive? - Before your customers called you to tell you? Capacity: - Do you know the rough capacity of your environment? How many jobs can run across the entire environment? How many total work item thread minutes in an hour? - How many GUI users, CLI users can be logged into your environment? How many did you build the environment for? Performance: - Do you notice long running (>6, > 24 hr) jobs? Are they normal and expected? - What’s the typical load average on your app servers? Database? Swap utilization? How much is too much?
© Copyright 12/5/2012 BMC Software, Inc
19
Performance Monitoring – Basic How much further can your environment grow before the next app server gets added? - Next X amount of database capacity? - File server capacity? - Reports server capacity? - (Sizing spreadsheet and available performance capacity) GUI: - Does the GUI have enough memory to run on your user’s workstation without swapping? how big is the typical footprint in your environment? What views can be closed? Paging may be a factor here - What’s the typical ping time & bandwidth between the different components of your installation? long ping time to agents & repeaters is fine within reason GUIs and appservers shouldn’t be “far” apart from each other or the DB) Do you measure and record this regularly (monitoring infrastructure)
© Copyright 12/5/2012 BMC Software, Inc
20
Performance Monitoring - Tools BPPM 9.0 Knowledge Module JMXCLI - Very detailed range of configurations Other Performance / Capacity Management Tools (BCO etc.) Classical tools: - Vmstat/iostat/top/etc. (UNIX) - nstats (NSH) - Perfmon (Windows) Empirical tools: - GUI performance
© Copyright 12/5/2012 BMC Software, Inc
21
Cleanup - Database
BMC Server Automation (BladeLogic) Depot C O N S O L E
M I D T I E R
N O D E S
© Copyright 12/5/2012 BMC Software, Inc
23
Configuring BSA DB Cleanup Configuring BSA DB Cleanup - To manage database size, you must proactively setup DB Cleanup and run its job(s) at regular intervals. - The first setup step is to configure a data retention policy. Database Retention Policies - Retention policy is the time period for which you need to retain/keep your data in the BSA database. - If organization policy is to only keep the compliance data of last 90 days, then set the retention policy to 90 (days) so that any data older than 90 days will be deleted by the cleanup operation.
© Copyright 12/5/2012 BMC Software, Inc
24
Configuring BSA DB Cleanup (cont’d) Setting a DB Retention Policy - Retention Policy setting is done in two places: The default time (in days) is set in the Property Dictionary in the Job class Policy can be set at individual job level (instance level), or for a given job type (Property Set Class: PSC) – Known issue – setting it a property set class level does not propagate the settings to all dependent jobs (e.g. of patching job)
For Auto-generated Depot objects
– The retention time for auto-generated depot objects and jobs is defined by the BLASADMIN system property AutoGeneratedRetentionTime. E.g. – » blasAdmin: set Cleanup AutoGeneratedRetentionTime » blasAdmin: set cleanup EnableRetentionPolicy true
© Copyright 12/5/2012 BMC Software, Inc
25
Default Retention Windows in the Property Dictionary
© Copyright 12/5/2012 BMC Software, Inc
26
RESULTS_RETENTION_TIME Property (Default)
© Copyright 12/5/2012 BMC Software, Inc
27
Setting NSH Script Job RESULTS_RETENTION_TIME to 90 days
© Copyright 12/5/2012 BMC Software, Inc
28
Configuring BSA DB Cleanup (cont’d) Applying a Retention Policy - After define the retention policy, need to apply it to objects, which marks all the relevant objects older than their retention limit for deletion - Objects marked for deletion are actually deleted when the DB Cleanup operation in run. - Applying retention policy is required before you run DB Cleanup. - Retention policy is usually applied by running this BLCLI command: Delete ExecuteRetentionPolicy
© Copyright 12/5/2012 BMC Software, Inc
29
Configuring BSA DB Cleanup (cont’d) What does the Retention Policy blcli command do? - Soft deletion - regularly run by DB Cleanup which is provided OOTB box in BSA – BSA Recommended Database Cleanup Job (NSH script job) - Customers usually ETL the BSA DB before they run DB Cleanup to ensure they have their oldest data in the BDS warehouse DB for reporting. -
Execute ETL then run BSA Recommended Database Cleanup Job
-
Documentation of using this OOB BSA job and its parameters is here: Changing Database Cleanup script options and commands
© Copyright 12/5/2012 BMC Software, Inc
30
Out of the Box Database Cleanup Script
© Copyright 12/5/2012 BMC Software, Inc
31
Configuring BSA DB Cleanup (cont’d) Historical Data - This types of data typically is responsible for the bulk of the database growth on a daily basis, with the Job Run Event Log data (JobRunEvent) and the Audit Trail data (AuditTrail) usually consuming the most space. - If you have a very large BSA DB and you have never run DB cleanup before, then we recommend that before you run the OOB NSH job, you should execute the following commands manually in sequence to cleanup your BSA DB in incremental steps – BLCLI Delete cleanupHistoricalData JobRunEvent BLCLI Delete cleanupHistoricalData AuditTrail BLCLI Delete cleanupHistoricalData JobSchedule
© Copyright 12/5/2012 BMC Software, Inc
32
What if I can’t run a full cleanup? (all at once & common problems) JRE/Audit Trail are very big - Offline cleanup: relatively fast, but requires BSA to be completely offline, no running appservers (truncate) Other tables are way too big, or a complete outage isn’t possible - Online “small” cleanup: Size of cleanup tasking depends on how many “days” of data you want to clean up: start with relatively “few” days of cleanup, nibble away at it depending on your available resources. - Schedule historical cleanups to run in available windows Cleanups sometimes compete with the BDSSA ETL process - Schedule opposite (or subsequent: what better time to run cleanup than after a successful ETL?), and use time limits to prevent collisions. - Time limit limitations (checked between SQL queries: configure blasadmin to hit fewer total objects at a time) Other: open a support ticket if you don’t already have one! © Copyright 12/5/2012 BMC Software, Inc
33
Recommended Schedule Based on experience at customer environments we have created some recommendations on how and when to run a database cleanup in a BMC Server Automation environment. General recommendation: -
run Historical cleanups on a daily basis (ex:, Mon-Sat) run the hard delete cleanups (cleanupAllSharedObjects, cleanupDatabase, cleanupFileServer) on a weekly basis (ex:, every Sunday).
If you are using BMC BladeLogic Decision Support for Server Automation (BDSSA, “reports”) ensure that the ETL does not run during the 'hard delete' cleanup run.
© Copyright 12/5/2012 BMC Software, Inc
34
Recommended Schedules (cont’d) Typical Jobs and execution schedules: Daily 'Historical' Cleanup -
Batch Job "Daily Cleanup" - set to run in series Cleanup Retention
– Batch Job "Cleanup Historical" - set to run in parallel » Cleanup JobRunEvent » Cleanup AuditTrail » Cleanup JobSchedule » Cleanup AuditResult » Cleanup SnapshotResult » Cleanup ComplianceResult
Cleanup AllAppServerCaches
Weekly 'Hard Delete' Cleanup -
Batch Job "Weekly Cleanup" - set to run in series Cleanup Retention Cleanup Database Cleanup Shared Objects (8.1 SP4+ only) Cleanup FileServer
© Copyright 12/5/2012 BMC Software, Inc
35
Monitoring DB Cleanup Use one of the following scripts to query the status of cleanup. For Oracle: ? ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY/MM/DD HH24:MI'; SELECT task_id, task_name, started_at, cast((cast(updated_at AS timestamp)-cast(started_at as timestamp)) AS interval day(0) to second(0)) AS duration, current_action, to_char(deleted_rows,'9G999G999G999') AS deleted_rows FROM delete_tasks WHERE ended_at IS NULL; Or this script for Oracle or this script for MSSQL. For SQL Server: ? SELECT task_id, task_name, started_at, DATEDIFF(mm, started_at,updated_at) AS duration, current_action, deleted_rows FROM delete_tasks WHERE ended_at IS NULL; If cleanup fails, the first place to look is in the delete_errors table: ? Select * from delete_errors;
© Copyright 12/5/2012 BMC Software, Inc
36
Monitoring Growth Monitoring Oracle tablespace growth (fewer surprises) -
Good to monitor the database growth over time with a handy sql script and cronjob. Modify them to match your environment and set them to run as a sysdba. This will send you an email every day and show you how much tablespace was consumed by the tables compared to the day before. This method can also be useful to spot problematic trends (for example, someone enabled the Audit Trail logging on Server.Read).
Gather statistics (use ours, not just the default) -
Oracle provides a default routine to calculate statistics on the tables and the DBA usually has this enabled by default. BMC has worked with Oracle and has created an improved procedure and this ships in the -external-files.zip for each version and is available from the EPD. The Oracle default should be disabled and the DBA should install and enable these to run on a weekly (at minimum) basis. This will improve both cleanup performance and database/user interface performance in general.
Alternative Gather Stats check (dbdiagnostics) -
As an alternative to running the script that was mentioned you could also use the 'dbdiagnostics' command to determine if the stats are up to date. In the /NSH/br directory there is a 'dbdiagnostics' binary. This can be run by:
-
# ./dbdiagnostics runDiag diabId=1000006
-
The results can be viewed by: #./dbdiagnostics getResLastExec diagId=1000006 diagId=1000006 execDiagId=2000040 execStartTime=2012-09-21 10:00:30.0 messageLevel=INFO message=DBMS_STATS_CHK: DBMS_STATS on the Database ran 221 days ago, which is NOT OK. The Expected running of DBMS_STATS is once in 15 days. Please run BL_GATHER_SCHEMA_STATS PROC for this schema. messageTime=2012-09-21 10:00:31.0
-
© Copyright 12/5/2012 BMC Software, Inc
37
How to run cleanup in large / busy environments Manage cleanup run time vs. ETL and critical jobs Cleanup will consume some amount of performance capacity: plan for it Adjust runtime and # of objects considered to complete within shorter windows: -
SQL Delete queries are unforgiving of running out of time and UNDO space 63 minutes of cleanup in a 60 minute window
Offline cleanups may be required to “catch up” Measure how much data is being generated weekly: treading water vs. forward progress Manage logging levels, max # of objects considered in a single query
© Copyright 12/5/2012 BMC Software, Inc
38
Cleanup – Fileserver & Appserver
File Server Cleanup Will clean up files by age Caution: if your patch payloads or metadata are mounted under the file server (ala /patch), cleanup will clean them up too!
© Copyright 12/5/2012 BMC Software, Inc
40
App Server Cleanup Logs / cores -
Check the “br” directory for zipped up old log files, core files (can easily be 2GB each) Old snapshot files (depending on version) Extra junk in /tmp on Solaris (impacts swap) Full C: on Windows Easy to spot check with “du -sk *”
© Copyright 12/5/2012 BMC Software, Inc
41
Agent Health & Agent Cleanup
BMC Server Automation (BladeLogic) Logical Architecture C O N S O L E
M I D T I E R
N O D E S
© Copyright 12/5/2012 BMC Software, Inc
43
Agent Health Agent Health is critical to successful job runs because the appserver is generous when trying to talk to a slow remote agent. -
JOB_PART_TIMEOUT
Agent Health Survey: -
Managed servers go up and down regularly Run the “Update Server Properties” Job periodically, and before a critical job updates AGENT_STATUS property: – “Agent is Alive” for hosts that are up, vs. – “Agent is Unavailable” for hosts that are down.
-
AGENT_STATUS in Server Smart Groups to include only available hosts in Jobs Can’t deploy to a host that’s not up
Recovery: -
Re-run Update Server Properties Job more often against a server group that only includes “down” servers Use a Server Smart Group to identify hosts that have been out of contact > 30 days
© Copyright 12/5/2012 BMC Software, Inc
44
Agent Cleanup The Problem: -
Deployment packages (and their rollbacks) accumulate in the Transactions folder
The Solution: -
blcli Delete cleanupAgent Cleans up all objects on the agent older than the specified parameter. Typical time frame of 30-90 days.
© Copyright 12/5/2012 BMC Software, Inc
45
Upgrades / Upkeep
Upgrades / Upkeep Upgrades are the easiest way to get new, improved (supported) code, with bugfixes. Agent Upgrades: - Agent upgrades are key to retaining functionality & avoiding running into “solved” bugs or compatibility issues when upgrading - File Deploy Job upgrade method - Unified Agent Installer upgrades coming in a future release - Upgrading agents outside of BSA (MSI installer) Require Planning (may be significant planning for larger environments!) There may be benefit to cleanup before upgrade or use offline upgrade techniques Many customers are asking for features that have been delivered in current releases TEST before PROD!
© Copyright 12/5/2012 BMC Software, Inc
47
Configuration Guidance
Additional Resources & Information BSA Best Practices Webinar Series:
-
https://communities.bmc.com/communities/docs/DOC-21692
BSA Best Practices Webinar Episode 1: Deployment Architecture:
https://communities.bmc.com/communities/docs/DOC-21693
Online Documentation - BSA Deployment Architecture Best Practices
http://docs.bmc.com/docs/display/public/bsa82/Deployment+architecture
-
Product Documentation
http://docs.bmc.com/docs/display/public/bsa82/Home
BMC Communities (public forum) - BMC website documents
discussions
whitepapers additional
information - https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/serv er_configuration_automation_bladelogic
What to do when you inherit a BSA installation, including “How to” videos: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/ 2012/06/15/taking-the-reins-server-automation
© Copyright 12/5/2012 BMC Software, Inc
49
Howto Videos Initial Install – Database Setup: On BMCdocs YouTube at http://www.youtube.com/watch?v=91FEUDVD6sE Initial Install – File Server and App Server Installs: On Communities YouTube at http://www.youtube.com/watch?v=m7Y3SY23kuQ Initial Install – Console GUI and Appserver Config: On Communities YouTube at http://www.youtube.com/watch?v=uwqlj60Lvo0 Compliance Content Install: On BMCdocs YouTube at http://www.youtube.com/watch?v=bXdaogDsCNc Compliance Quick Audit: On BMCdocs YouTube at http://www.youtube.com/watch?v=i8BLi4WAWEY BSA 8.2 Patching - Setting Up a Windows Patch Catalog: On Communities YouTube at http://www.youtube.com/watch?v=nfpFpOuub9k. Windows Patch Analysis: On Communities YouTube at http://www.youtube.com/watch?v=ODWhC01uEaQ. Patching in Short Maintenance Windows with BMC BladeLogic Server Automation: On Communities YouTube at http://www.youtube.com/watch?v=o6Lfzbb3JZg. Basic Software Packaging: http://www.youtube.com/watch?feature=player_embedded&v=dtOWTTFqsaY SOCKS Proxies: https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bla delogic/blog/2012/11/30/how-to-use-socks-proxies-with-bsa-to-deal-with-firewalls-and-overlapping-ip-ranges
© Copyright 12/5/2012 BMC Software, Inc
50
Q&A