Transcript
SCALABLE MULTIMEDIA SYSTEM FOR INTERACTIVE SURVEILLANCE AND VIDEO COMMUNICATION APPLICATIONS. Marco Raggio, Ivano Barbieri, Gianluca Bailo Department of Biophysical and Electronic Engineering - University of Genoa Via Opera Pia 11 A – 16145 Genova – ITALY Tel: +39 010 3532274; fax: +39 010 3532175 e-mail:
[email protected],
[email protected],
[email protected]
that media type can interoperate. H.324 allows more than one channel of each type to be in use. Other Recommendations in the Recent advances in embedded computer electronics and H.324 series include the H.223 multiplex, H.245 control [4], integration in multimedia standard specifications are enabling H.263 video codec [3], and G.723.1 speech codec [5]. cost-effective terminals for video communication and conferencing with potential capabilities in video surveillance Video codec H.263/H.261 Receive application, both on wireless and wired links. Following a tutorial Path Delay approach, the paper describes the practical implementation of a Audio codec multimedia system and terminals implemented at our labs. The G711,G722,G.723, I/O Equipment G.728,G.729 system (Real Time Scalable Video Communication Unit – VCU) H.225.0 Applications Local Area Layer and Control Network is based on latest extension to Standard Recommendation ITU-T System Control Interfaces Interface H.324 and H.323, it performs real time audio, video and data H.245 Control communication with the today available network technology and consumer electronics components. The system was provided with Call Control H.225.0 enhanced video and remote control setting capabilities, leading to RAS Control H.225.0 multimedia terminals suitable for both interactive video conferencing applications and remote video surveillance, in particular, when variable and/or low bitrate links are available, Figure 2 Scope of H.323 terminal equipment Recommendation. exploiting video scalability. ABSTRACT
The H.323 standard [2]; (see Figure 2) provides a foundation for audio, video and data communications across IPH.324 and H.323 specification address multimedia based network, including the Internet. H.323 covers the technical communication standards over PSTN channels and IP-based requirements for audio and video communication in LANs that do not provide a guarantee Quality of Service (QoS). Four main networks modules compose H.323 Recommendation: Terminal, Gateway, Gatekeeper, and Multipoint Control Unit (MCU). Further, three Video codec components are required in a generic H.323 standard compliant H.263/H.261 terminal. They are Q.931, (call signalling and call set-up for the Audio codec physical support based over ISDN connection) [4], RAS delay G.723 (Registration/Admission/Status), protocol used to communicate Multiplex/ Modem Demultiplex with a Gatekeepers, RTP/RTCP support for sequencing audio and V . 3 4 / V . 8 Data protocols H .223 video packets [6] and H.225.0 multiplexer [7]. V.14, LAPM, etc. 1
MULTIMEDIA COMMUNICATION STANDARDS
Control protocol
SRP/LAPM
H.245
Procedures
2 Modem control V.25ter
Figure 1 Scope of ITU-T H.324 multimedia system Recommendation H.324 scope [1] include system components depicted in Figure 1, it describes terminals for low bitrate multimedia communication, utilising V.34 modems operating over the POTS and ISDN network. Support for each media type (voice, data, and video) is optional, but if supported, the ability to use a specified common mode of operation is required, so that all terminals supporting
THE SCALABLE VIDEO SURVEILLANCE UNIT
The system was integrated following a software (SW) multithread approach with round buffer interfaces; separate threads are created for the coding and the decoding. Inside the two data flows, processing is further subdivided into separate threads and round buffer interfaces. The SW development of the VCU was based on C++ programming language composed by a main project and static libraries as sub-projects. Preprocessor statements allow various configurations to be set for the VCU. The hardware platforms for the implementation of the multimedia terminals were consumer multimedia and embedded PCs, running Windows NT40 or CE (mobile terminal) and Linux OS. The system can inter-operate over multiple networks, allowing transmission protocols selection and parameters tuning at users
interface level making the system suitable for both algorithm testing/research purpose and multimedia terminal prototyping according to application and user requirements. A modular and Standard-based HW/SW approach has been followed for the system implementation and integration. This approach, rather then custom algorithm and system solution, gives some overhead in terms of protocols complexity and interoperability testing issues but also a number of benefits in terms of exploitation of available technology at different level of the system development and technology transfer. The VCU terminal implements a H.263 video coder, the dual rate G.723.1 speech coder (in addition to the mandatory G.711) a GSM speech coder (half and full rate) and H.245 system control. The video codec integrated into the system was H.263v2 [2], whose video scalability is exploited in the surveillance application, for instance, while connecting at the same time GSTN, LANs and GSM lines in a multipoint scheme. (See also section 2.6)
Central Station
Remote Station
programming level through common API and SW libraries. 2.2
The Multiplexer
Multiplexing and demultiplexing of video, audio, data and control streams into and from a single bit stream for the H.324 configuration was implemented according to H.223. This Recommendation specifies the frame structure, format of fields and procedures of the packet multiplexing protocol. The control procedures necessary to implement the multiplexing protocol were provided integrating into the VCU a H.245 system control. Multiplexer and Demultiplexer are implemented as two separate entities, both consisting of two distinct layers: the adaptationlayer and the multiplex layer. Also Multiplexer and Demultiplexer were integrated into the VCU as a library for modularity. Additionally, proprietary solution for optimal audiovideo synchronisation among audio and video threads was studied and implemented. 2.3
The Communication Control
The ITU standard H.245 specifies syntax and semantics of the messages used in the control protocol and routines for exchanging the messages. The communication control protocol ~9.6 kbps was implemented as a separate thread and the interface between Remote Control Protocol and the system is made through a round buffer. Station/ The system control was integrated into the VCU main software Storage project as a sub-project. Supported procedures allows to determine the master and slave terminals, exchange capabilities, Figure 3 Multipoint configuration open and close the logical channels and exchange entries for the multiplex tables. Support for retransmission protocol (SRP) is In the H.324 configuration, the interconnection-capability, among supported also. more then two terminals, (connectivity requirements over etherogeneous network, for instance GSTN at GSM rate and 2.4 Network Interfaces ISDN) exploiting video scalability, was made possible by the implementation of a custom Multipoint Control Unit (MCU). Network interfaces supported for the VCU are analogue POTS, Specifications are provided instead for Packet network digital ISDN and Packet based network. PSTN use V34+ modem communication, in the H.323 environment. For the surveillance at 33.6Kbps, N-ISDN at 64Kbps or 128 Kbps plus E-DSS1 application the system is set with asymmetric configurations. The channel and GSM at 9.6 Kbps. Static Libraries were developed to two VCU terminals have different I/O requirements. For instance, access the HW network interface for analogue and GSM lines the Remote terminal transmits the video data, camera position, with Telephone Application Program interface (TAPI) [12]. environmental data and receives the camera and the video Common-Application Program Interface (CAPI) [13] was used encoder parameters controls while the VCU at the control station for the digital ISDN lines and Winsocket2 API with unreliable receives and transmits data and controls in the opposite direction. datagrams (UDP), reliable connection-oriented byte streams According to user setting, at the multiplex layer, the required (TCP) and raw socket. logical channels for video and data are opened, the Data logical channel is used both for data and control. A thread, started after 2.5 The H.323-Based Video Surveillance System the connection is established performs the control for the remote In order to provide connectivity over packet based network, in the VCU. The thread sends/receives data and controls. VCU system were included additional software libraries, in particular H.225.0 and Q.931 libraries. Additional preparation of 2.1 The Video Codec both audio and video payloads was also required. For this The video codec integrated in the VCU is a H.263v2 configuration, the H.223 multiplexer library is replaced by the compliant scalable codec, which support main Options for H.225 library, which provide packetisation, synchronisation and multilayer spatial and SNR scalability and packetized support for RTP/RTCP protocols. H.225.0 was developed as a transmission of the stream. The video encoder and the video multithread library, running both in Microsoft Windows and in decoder are composed in a single static library even if they are Unix OS. H.225.0 library kernel is composed by several sets of implemented as separate threads. The video encoder/decoder I/O functions, implementing Real-Time Protocol (RTP) and Realinterfaces integrates round buffer mechanism as interface to the Time Control Protocol (RTCP). H.225.0 utilise socket system multiplexer/demultiplexer. Different Video acquisition grabber calls in order to provide user data channels, control channels, to can be selected, according to application requirements, the control the thread data flow and to give to the application level a selection can be made at the user interface or at software set of utility function. Three controls message types are presents 10 ÷100 Mbyte
and are transmitted. They are RAS/H.225.0, (UDP socket located at port 1719), Q.931/H.225.0, (TCP socket located at port 1720), and H.245/H.225.0. These messages have a TPKT header, composed by 4 octets. The header is used to transmit message length. The thread managing H.245 move the H.245 message to a roundbuffer queue; a semaphore warns about this message. The sender thread H.245/H.225.0 reads from this roundbuffer and sends the message through H.245/H.225.0 TCP socket. The H.245/H.225.0 receiver thread examines this header, to know its length; then move the message to a roundbuffer queue; the message presence is notified by a semaphore. The H.245 thread will read and process this message. The data channel is implemented as a TCP reliable socket, to send and receive user data (e.g. files ftp, whiteboard, etc). This channel and its capabilities are negotiated between terminals at connection set-up, through H.245 messages. With multiple audio and video streams, unreliable transport via UDP uses IETF RTP specification [6], to handle audio and video streaming. RTP is controlled according to IETF RTCP specification. Each terminal can open up to three UDP channels: one for audio transmission, and up to two for video transmission. IETF specify a RTP/RTCP payload format for H.263v2 [14], which is currently adopted in H.225.0 [7]. While bitstreams are coded using scalability options, both for spatial and SNR scalability, each layer use a unique IP address and port number combination. The temporal relation between layers is expressed using the RTP timestamp for synchronisation. The video stream is carried as payload data within RTP packets to enhance its resiliency towards packet losses. This payload system defines the usage of RTP fixed header and H.263v2 video packet structure. A RTP formal payload is not currently defined for G.723.1 nor G.711. A G.723.1 payload and a G.711 were implemented, following the guidelines [18]. The audio channel gets data from G.723.1 or G.711 audio encoder. The first video channel is reserved to the base layer stream from H.263v2 video encoder, while the second (optional) video channel is reserved to the enhanced video stream layer from H.263v2. Audio and video synchronisation is reached, using an efficient timestamp mechanism. Improved error resilience algorithms are used. These algorithms are based mainly on the GOB correct positioning inside the transmitted frames. In order to minimise delay and optimise synchronisation, RTCP is periodically monitoring RTP transmitted/received data, providing report that are used to control system parameters.
Figure 4 Graphical User Interface of the Video Communication Unit.
The GUI provides a set of controls over local and remote terminals that allow generic interactive remote surveillance and control applications. Communication with a remote terminal is programmed through dialog boxes. Recording/playback facilities are also available. Through a windows console it is possible to establish a ‘chat’with a remote terminal, transfer files/data and to control system operations. It is possible to configure the system for full remote control of remote terminals; the communication can be remotely started, stopped and parameters of the remote terminals can be modified from the local/central terminal. It is possible to set different access right and have multiple access; parameters of a remote VCU can be run-time updated from dialog-box. A VCU terminal can be configured to use two or more links at the same time, for instance, if available: POTS at 33.6Kbps and GSM at 9.6Kbps lines. A terminal configured as Central Station from where far sites are surveyed and other terminals configured as remote Surveillance Stations. One link can be used to poll remote sites and the other to automatically receive emergency call after programmable event detection. Event detection could start several actions from the stream handling point of view. For instance, exploiting video scalability, a low bitrate video is send to a mobile terminal connected at GSM rate, while higher quality video (based plus enhanced video layer) are either stored on the remote terminal itself or send to another terminal, connected over higher bandwidth link. This stream can be stored for recording/documentation purpose and/or off-line downloaded to the mobile terminal. 2.6 The User Interface A former release of the system, output from a European project A VCU terminal is provided with a simple graphical [17], was demonstrated in [16] and was equipped with specific user interface (GUI) which provides windows interface to several MMI and additional software, to be used in Field Test for remote parameter (Figure 4). surveillance and fire detection. 3
VCU SURVEILLANCE FEATURES
Additionally, specific features of interest in surveillance application were integrated. In particular, a low complexity face detection algorithm [8] was integrated with the video encoder to improve the subjective image quality in a region of interest of the coded sequence and to make available input for face/object recognition systems. The algorithm applies a large number of ellipses of different sizes to the image and chooses the one that best suits to a human face. First, the image is pre-processed in order to isolate the areas of the image where skin is present; the
ellipse fitting process is then applied to the binary image obtained on the first step. The output of the face detection is used to define a mask of regions of interest and the bitrate control algorithm was designed to deal with these masks [9]. At user interface level, it is also possible to manually select a Region of interest for masking purpose or improve video quality in a fixed region of the coded sequence. More detailed description can be found in [10]. For transmission over QoS network according to H.324, a simple error resilience mode was implemented for the video codec based on the automatic refreshing of the images by INTRA blocks [11] This mode does not require a feedback channel, whenever the buffer level is lower than a predefined threshold. This mode is particularly suited to video sequences with large very low motion periods interspersed by short high motion periods, typical of video-surveillance applications. Using the error resilient mode, the image is refreshed frequently during the very low motion periods, as the buffer level is low. During the high motion periods the buffer level increases and the number of blocks “artificially” classified as INTRA approximates zero. This mode can be set also while working over packet network, according to H.323, additionally to the standard recovery and control techniques for error recovery end concealment. Combination of H.263v2 options can be used to improve error resilience and error concealment in transmission over packet networks; although efficient, complexity increased for the overall system; simpler solution, like the one proposed in [15] we found of practical application in our system. 4
SYSTEM PERFORMANCE
Performances of multimedia system with real time constraints depend on several parameters. Starting from video frame rate, its visual quality, resulting bitrate and computational load, it is necessary to evaluate (and provide mechanism to optimise) synchronisation and delay. Workload distribution over the system is fundamental as well, for performances. Some resuming figures are presented in Table 1 on average performances in field test condition, internetworking over a network and video acquired from cameras. Format Bitrate
8
QCIF 32
CIF 8
4CIF 32
8
32
Framerate
Min max
2 10 0.2 2.5 0.1 5 25 1.3 5.3 0.3 Table 1VCU average transmission frame rate.
0.6 1.5
References. [1] ITU-T Recommendation H.324, “Terminal for low bitrate multimedia communication”, June 1996. [2] ITU-T Recommendation H.323-v2, “Packet based multimedia communications systems”, 2/98. [2] ITU-T Recommendation H.263, “Video coding for low bitrate communication - DRAFT 21”, Dec. 1997 [4] ITU-T Recommendation H.245, “Control protocol for multimedia communication - version 2”, 03/97 [5] ITU-T Recommendation G.723.1, “Dual rate speech coder for multimedia communication transmitting at 5.3 and 6.3 Kbit/s”, October 1995 [6] IETF - RFC1889 - “A Transport Protocol for Real-Time Applications”, January 1996
[7] ITU-T Recommendation H.225.0 “Media stream packetization and synchronisation on non-guaranteed quality of service LANs”, 11/96. [8] A. Eleftheriadis and A. Jacquin, “Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates”, Signal Processing: Image communication, pp. 231-248, 1995. [9] Lara Lourenço and Luís Corte-Real, “Comparison of the performance of VM7.0 and TMN8 rate control algorithms adapted for coding Regions of Interest”, ISO/IEC JTC1/SC29/WG11, MPEG97/2521, April, 1997. [10] "Advanced Video-Based Surveillance Systems", , G.L.Foresti, P.Mahonen, C.S.Regazzoni - Kluwer, in press [11] Roalt Aalmoes, Video compression techniques over lowbandwidth lines, Master Thesis, Twente University, 1996. [12] C. Mirho, A. Raffman, "Reach Out and Touch Someone's PC: The Windows Telephony API", Microsoft Systems Journal, Vol. 8, N. 12, December 1993. [13] COMMON-ISDN-API V2.0, http://www.capi.org/document.htm [14] “RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)” draft-ietf-avt-rtp-h263-video-02.txt, May 7, 1998 [15] Stephan Wenger - Document Q15-G-17 - “Using RFC2429 and H.263+ at low to medium bit-rates for low-latency applications” February, 1999 [16] I. Barbieri and M. Raggio ”Real Time Scalable Video Communication Demonstration at ICMCS-99” - IEEE MULTIMEDIA SYSTEMS '99, Florence, Italy, June 99. [17]http://www.infowin.org/ACTS/RUS/PROJECTS/ac077.htm “Scaleable Architectures With Hardware Extensions For Low Bitrate Variable Bandwidth Realtime Videocommunications ”. [18] IETF-RFC2736 “Guidelines for Writers of RTP Payload Format Specifications”, December 1999