Preview only show first 10 pages with watermark. For full document please download

Chi Latex Extended Abstracts Template

   EMBED


Share

Transcript

Affective Benchmarcking of Movies Based on the Physiological Responses of a Real Audience Julien Fleureau Technicolor Cesson-S´evign´e [email protected] Philippe Guillotel Technicolor Cesson-S´evign´e [email protected] Izabela Orlac Technicolor Cesson-S´evign´e [email protected] Abstract We propose here an objective study of the emotional impact of a movie on an audience. An affective benchmarking solution is introduced making use of a low-intrusive measurement of the well-known ElectroDermal Answer. A dedicated processing of this biosignal produces a time-variant and normalized affective signal related to the significant excitation variations of the audience. Besides the new methodology, the originality of this paper stems from the fact that this framework has been tested on a real audience during regular cinema shows and a film festival, for five different movies and a total of 128 audience members. Author Keywords Affective computing; ElectroDermal Answer; Signal Processing; User Experience ACM Classification Keywords H.5.1. [Information Interfaces and Presentation]: Evaluation/methodology. Introduction Copyright is held by the author/owner(s). CHI’13, April 27 – May 2, 2013, Paris, France. ACM 978-1-XXXX-XXXX-X/XX/XX. Getting the affective state of an audience watching a movie may have many potential applications in a video content creation and distribution context. A movie director may be interested by such information to validate his artistic choices and/or adapt the editing to optimize the emotional flow. It may also help a content distributor select the best target countries or population. Nevertheless, obtaining the real-time emotional state of an audience in an objective and low-intrusive manner is not easy. One direct method would be to collect, as it is already done by some studios, a direct self-assessment of each viewer, minute per minute. This approach is clearly too intrusive, quite subjective and may bias the reporting by distracting the subject from the movie. Face analysis through an adapted camera could be an alternative, but the associated algorithms [10] may be very sensitive to the user environment (low lighting conditions) and not always adapted to natural “close-to-neutral” facial expressions. Another approach, adopted in this paper, is based on the recording of physiological signals. Many biological signals are known to vary with the emotional state of a subject. Some typical signals are the ElectroDermal Answer (EDA), the Heart Rate Variability (HRV) or the facial surface ElectroMyoGram (EMG). The correlation between such signals and the affective state is well-known in the literature [8, 2] and the devices to record such biodata are becoming more and more compact and non-obtrusive [9, 4]. Furthermore, if evaluating the interest of such signals in the context of movie viewing is not new [7], more and more recent works try to correlate more accurately the user’s emotion with the video content. For example, in [11], physiological signals (including EDA) are correlated with audio features extracted from the video stimulus. In [13], the authors developed a user-independent emotion recognition method to recover affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. In [3], the EDA signal is correlated with the arousal axis in a context of affective film content analysis with a special focus made on the narrative coherency. In [5], a real-time approach of the emotional state detection is alternatively presented where the authors especially focus on affective events (i.e. fast increase of the emotional arousal) during a video viewing and try to guess the associated valence using classification tools. However, as far as we know, such works do not really allow to answer the three following questions in the context of movie viewing: i) When did the audience significantly react ? ii) How much the audience react for each movie highlight ? and iii) What is the emotional impact of one movie compared to another ? In this paper we propose an original affective benchmarking solution for movies that may bring an answer to the three previous questions. It is based on a dedicated processing of the EDA that produces a time-variant and normalized affective signal (termed “Affective Profile”) reflecting the significant arousal variations of the whole audience over the time in a quantitative and comparable way. Secondly and for the first time, an evaluation of this framework on a real audience during regular cinema shows and film festival is performed for five different movies and a total of 128 audience members. The methodology as well as the obtained results are presented in the following. Affective Benchmarking Solution EDA as an arousal sensor Getting the mean reaction of a whole audience over the time is very related to the study of the arousal fluctuations of this same audience during the show (cf. the bi-dimensional “Valence / Arousal” representation of the emotion [12]). The ElectroDermal Activity (EDA) measures the local variations of the skin electrical conductance in the palm area which is known to be highly correlated with the user affective arousal ([8, 1]). This signal embeds high-level human-centered information and might be used to provide a continuous and unconscious feedback about the level of excitation of the end-user. More specifically, each shift in the arousal flow of a given user is correlated with slow (∼ 2s) phasic changes in his/her EDA consisting in a fast increasing step until a maximum value (the higher the emotional impact is, the higher the peak is), followed by a slow return to the initial state. Individual Affective Profile For a given audience member, a time-variant and normalized affective signal, termed “Individual Affective Profile” (IAP) may be thus directly obtained from the EDA as described in Figure 1. After a first low-pass filtering to remove possible artifacts, the EDA is derivated and truncated to positive values in order to highlight the relevant phasic changes. The signal is then temporally filtered and sub-sampled using overlapping time window to obtain a time-resolution of 30s, sufficient to analyze the emotional flow and to take into account possible delays in the user’s reaction during the movie. To remove the user-dependent part related to the amplitude of the EDA derivative (which may vary from one subject to another), the obtained signal, termed p, is normalized (area under the curve equal to one) so that it may be interpreted as a probability of reaction over the time for the current audience member. The resulting signal termed pn is called “Individual Affective Profile” whereas the normalization factor termed xn , is called “Individual Affective Intensity”. [t] Raw EDA signal Low-pass filtering S(t) Numerical derivation S’(t) Thresholding S’+(t) Individual affective profile pn Normalize p by its individual affective intensity xn p[i] Compute the mean value p[i] of S’+ on every time window Wi Figure 1: Description of the different steps of the algorithm to compute an Individual Affective Profile from the EDA. Mean Affective Profile However, taken alone, pn and xn only reflect the reaction of one given user. Such quantities are thus very related to the user’s sensitivity and may be also damaged by user-specific noises (motions artifacts or user’s external reactions, for instance). To obtain a more relevant information concerning the arousal variations of a global audience during a movie, a “Mean Affective Profile” (MAP) pn is computed by averaging the individual affective profiles of every audience member pin , 1 ≤ i ≤ N (where N is the size of the audience) and by scaling this quantity according to: N N xn X i 1 X i pn = p where xn = x N i=1 n N i=1 n (1) As a result, pn takes into account not only the arousal fluctuations of the audience over the time but also the “Mean Affective Intensity” (MAI) of those reactions, contained in the scaling factor xn . By observing the variations of pn , one can compare different moments of a movie in an affective way and, as those quantities are scaled to the mean affective intensity, one may also compare those values between several movies (see Figure 3). Relevant Affective Parts However, as soon as N is small, the strong inter-subject variability (see Figure 3) makes the interpretation of the quantity pn more tricky. Indeed, the values of pn may be not statistically significant. A statistical strategy is thus adopted to discriminate between the background noise (no or very little affective reaction) and the relevant information of the MAP. More precisely, each time sample k is considered as a random variable Pk associated to the latent unknown distribution of pin [k], 1 ≤ i ≤ N . The 10% of variables with the lowest mean are considered as the background noise (other approaches based on video meta-data may be also adopted). Then, for every random variable Pk the mean p-value ek is computed by averaging the p-value of the bilateral Mann-Whitney-Wilcoxon test performed between the variable Pk and each variables of the background noise (null hypothesis: the distributions of both groups are equal). Each Pk with an associated ek value lower than 5% is considered as significantly different from the background noise and is added to the “Relevant Affective Parts” (RAP). Evaluation in Theater To validate the proposed framework and for the first time in such a context, experiences were conducted on a real audience during regular cinema shows but also during special shows in the context of the film festival, “Festival du Film Britannique de Dinard”. More precisely, for two french movies proposed at the Festival de Cannes (2012), namely, “De rouille et d’os” from Jacques Audiard (movie #1 - drama involving a handicapped young girl) and “Le grand soir” from Benoˆıt Del´epine and Gustave Kervern (movie #2 - light comedy with an off-kilter humor), the EDAs of 3 × 14 audience members were recorded during three different regular cinema shows in a partner theater named “Le S´evign´e” (Cesson-S´evign´e, France). People were offered a free ticket for their participation and the ages as well as the genders of the audience (42 audience members per movie) were reasonably balanced. In a very similar way, some other experiments were also conducted during the film festival “Festival du Film Britannique de Dinard” in partnership with the organization committee (Dinard, France). For three other movies, namely, “Now is good” from Ol Parker (movie #3 - upsetting drama involving a young girl with cancer), “Life in a day” from Kevin MacDonald (movie #4 documentary aggregating real life records of multiple participants) and “I, Anna” from Barnabe Southcombe (movie #5 - absorbing thriller), the EDAs of respectively 24, 12 and 12 persons were also recorded with quite balanced ratios in terms of ages and genders. In both capture sessions, each subject was free to select its own position in the room (no control of the placement) which temperature was regulated by the air-conditioning system of the theater which theoretically prevent from big temperature variations. A consumer BodyMedia Armband sensor [6, 11] was placed on the palm area of each participant (Figure 2) to record the skin conductance at a 32Hz rate. The entire recording process was synchronized and controlled and the EDA was directly stored on a memory embedded on each sensor (no wireless transmission). The sensor placed on the fingers was very well accepted by the users who reported not to have been disturbed by its presence during the film. At the end of each show, a simple questionnaire was submitted to the participant. They were especially asked to report the different highlights in the movie they considered as the most relevant in terms of emotional impact. From those post-hoc questionnaires combined with the a real-time vocal annotation (in the projectionist space) of the movie made by three different persons who have already watched the movies twice, a “ground truth” of the different highlights of each film was built up. Figure 2: Bodymedia sensor placed on the fingers. Results and Discussion The IAPs, MAPs and RAPs were respectively computed for the five different movies and results are proposed in Figures 3, 4 and 5. For all movies, one can observe a Mean Affective Profile with variations and peaks quite well synchronized with the highlights identified from the audience and the vocal annotation. Those peaks are significantly different from the “background affective noise” as it can be observed on the associated p-value traces (Figure 3), and their amplitudes are consistent with the intensities of the events. It is especially the case of the “drowning event” of the movie #1 but also of the ends of movies #3 and #5 which were reported as very shocking, emotional or thrilling highlights by all audience members. Those events have consistently a very high mean value and are also identified as RAPs. On the contrary, non-highlight parts have a lower mean value and are generally not identified as RAPs. Besides punctual analysis, the MAPs also allow to analyze the temporal and dynamics aspects of the emotional arousal during each movies. One can first observe how the values of the different MAPs tend to increase during the movie. It is especially the case of movies #1, #3 and #5 where a progressive increase of the emotion strain until a last ultimate event can be notices. Those quantitative observations are consistent with the qualitative nature of the movie creation: it is often the goal of a creator to make the audience more and more involved in the movie story and the computed MAPs allow a quantitative verification of this intent. In line with this latter observation, one can also notice a quite high peak on almost each movie around the second 1500. This peak may also illustrate a cinematographic technique well-known by the directors which consists in “waking-up” the audience after a certain duration from the beginning (around one third of the whole movie duration) to make them be “engaged into the movie”. As described before, the computed MAPs offer the possibility to quantify punctual as well as dynamic tendencies of the audience reaction during a movie. Furthermore, due to their normalized nature, they also give a way to quantitatively compare the reaction levels in between different movies. For instance, when comparing the MAPs of movies #1 to #5, it appears that movies #2 and #4 tend to have a lower mean intensity than the others. Such an observation is consistent with the global comments from the critics: movies #1, #3 and #5 are respectively upsetting dramas or absorbing thriller whereas movies #2 and #4 are respectively a light comedy with an off-kilter humor and an “experimental” documentary. Regarding the scattering of the IAPs (Figure 3), even if a quite high variability may be noticed, the behavior of the median values seems to confirm the trend already observed on the mean values and give a rough idea of the inter-subject dependency. One can however point out a higher variability for the high mean (or median) values of the IAPs. This behavior may be partly explained by the facts that: i) when nothing happens in the movie, everybody globally don’t react, whereas ii) when an event occurs in the movie, some of the audience members significantly react whereas others remain insensitive. Variability is thus higher for larger RAPs but those parts remain statistically different from the background noise according to the associated p-values. Finally, a last remark can be pointed out regarding the interest of using the EDA to quantify sustained states. In such situations, one could think that the EDA only reacts to quite strong and precise events and thus, that nothing could be noticeable in the computed MAPs for sustained parts of the movie which are supposed to be long and slightly progressive increase of the subject’s arousal. In fact, the EDA does not react once but repeatedly and regular peaks may be observed in such situations. Those peaks are thus detectable in the positive-truncated derivative and appear in the associated MPAs as one can especially observe at the thrilling end of the movie #5 where the intensity remains high for more that 300 seconds. Conclusion and Future Works In this paper we have proposed a new solution to affectively benchmark a movie in terms of arousal variations. It is based on the measurement of the EDA and a dedicated processing that produces a time-variant and normalized affective signal. This signal, termed “Affective Profile”, is easily usable for a content creator that would be interested by studying its movie in an objective and affective manner or for studio and distributors interested by marketing analysis and commercial predictions. The framework has been carried out, for the first time in such a context, on a real audience during regular cinema shows and film festival, for five different movies and a total of 128 audience members. The obtained results in such an ecological context are consistent with the explicit reporting of the participants and prove that such a solution may offer new possibilities to the content creators and distributors to analyze and compare their movies in an affective way. A deeper validation on more movies has now to be done as well as a refinement of the analysis on sub-categories of the audience (by gender, age, ...). Correlation in between audience member reactions could be indeed interesting paths to investigate. Furthermore, in this paper, the questions of when and how much an audience reacts have been addressed but, the problem of how this same audience reacts (positively or negatively) has been eluded. This question of the valence of the reactions is thus another interesting research direction for the future that we will have to consider. References [1] Boucsein, W. Electrodermal Activity (Second Edition). Springer, 2012. [2] Calvo, R., and D’Mello, S. Affect Detection: An Interdisciplinary Review of Models, Methods, and their Applications. IEEE Trans. on Affective Computing 1, 1 (2010), 18–37. [3] Canini, L., Gilroy, S., Cavazza, M., Leonardi, R., and Benini, S. Users’ response to affective film content: A narrative perspective. vol. 1 (2010), 1–6. [4] Fletcher, R., Dobson, K., Goodwin, M., Eydgahi, H., Wilder-Smith, O., Fernholz, D., Kuboyama, Y., Hedman, E., Poh, M., and Picard, R. iCalm: Ali is meeting his sister again emotional scene Stéphanie’s Accident shocking scene Ali is shaking his son violently The dog is put in shocking kennels and the son is scene extremely sad shocking scene Scene of violent freefight shocking scene Love scene with sport coach love scene First love scene between Ali and Stéphanie (« opé ») love scene Moving scene between Stéphanie and the son emotional scene Son’s drowning shocking scene Sister’s layoff shocking scene Figure 3: Affective benchmarking solution applied to movies #1. (top) Box plot of every “Individual Affective Profiles”. Each box represents the distribution of the IAPs for every users at a given time. The central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (middle) Mean p-value of the bilateral Mann-Whitney-Wilcoxon test. Time samples with a p-values under the red line (p = 0.05) are considered as “Relevant Affective Parts”. (down) “Mean Affective Profile”. Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed. Ali is meeting his sister again emotional scene Mother’s birthday funny scene Stéphanie’s Accident shocking scene « Not » is fidgeting in front of people eating at restaurant funny scene The dog is put in kennels and the son is extremely sad shocking scene The mother is pushing her strange trolley with the baby inside funny scene Ali is shaking his son violently shocking scene Scene of violent freefight shocking scene « Not » is Jean-Pierre is trying to make discovering with his brother his banker that he hired has been ruined funny scene by his wife funny scene Love scene with sport coach love scene First love scene between Ali and Stéphanie (« opé ») love scene The parking attendant is discussing with the father funny scene Moving scene between Stéphanie and the son emotional scene Scene with the hanged person funny scene Sister’s layoff shocking scene Son’s drowning shocking scene Jean-Pierre is trying to convince a former colleague to take part in his « revolution » funny scene Figure 4: “Mean Affective Profiles” computed during regular movie shows. Movies #1 (top) and #2 (bottom). Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed. Movie #3 First kiss with wig shoking Mushroom consumption funny Catheter removal shoking Adam is discovering Tess's cancer and is seeking her in the forest emotional Arrested in the supermarket thrilling Motorbiking with Adam thrilling On the beach with Adam romantic Adam is introduced to the father + Abortion confusing Yes/No with mother in hospital emotional Tess's nose is bleeding shoking Tess's blackout and hospitalization shoking Ice-skating + No abortion thrilling Discovery of Tess' tags thrilling Adam is jumping through the window + love scene romantic Tess's dad is very sad and is crying emotional Tess's death description emotional Robery in the supermarket thrilling Blackout close to the fire shoking Movie #4 Discussion with a drunk guy funny A guy is eating an egg with a chick inside disgusting A gay is annuncing his homosexuality to his grandmother emotional Cow in the abattoir disgusting A funny priest is re-marrying a couple funny A urban climber is climbing buildings and a bus impressive Lighting and flying balls are launched in the sky emotional Movie #5 The policeman is chasing Anna close to the phone box thrilling Discovering of the scene where the crime happened thrilling The policeman is chasing Anna in his car during the night thrilling Anna is walking very fast with the Unvolontary stroller and is speed-dating + pushing an First meeting empty swing between Anna troubling and the policeman George's son is romantic chased by the police thrilling Call between Anna and the policeman during the night romantic Anna is lying to the policeman about their first The policeman is meeting discovering the troubling empty room thrilling Sexual scene between Anna and Georges + Anna is upset in the elevator thrilling The baby has a crash thrilling Suicide attempt thrilling Anna with her knife in the kitchen thrilling The police is trying to arrest Anna thrilling First speeddating schemer Figure 5: “Mean Affective Profiles” computed during the “Festival du Film Britannique de Dinard”. Movies #3 (top), #4 (middle) and #5 (bottom). Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed. [5] [6] [7] [8] Wearable Sensor and Network Architecture for Wirelessly Communicating and Logging Autonomic Activity. IEEE Trans. Inf. Technol. Biomed. 14, 2 (2010). Fleureau, J., Guillotel, P., and Huynh-Thu, Q. Physiological-based affect event detector for entertainment video applications. IEEE Trans. on Affective Computing 99, PrePrints (2012). Jakicic, J., Marcus, M., Gallagher, K., Randall, C., Thomas, E., Goss, F., and Robertson, R. Evaluation of the sensewear pro armband to assess energy expenditure during exercise. Med. Sci. Sports Exerc. 36, 5 (2004), 897–904. Kleinkopf, H. A Pilot Study of Galvanic Skin Response to Motion Picture Violence. Texas Tech University, 1975. Lang, P. The emotion probe. Am. Psychol. 50, 5 (1995), 372–385. [9] Matthews, R., McDonald, N., Hervieux, P., Turner, P., and Steindorf, M. A Wearable Physiological Sensor Suite for Unobtrusive Monitoring of Physiological and Cognitive State. In Conf. Proc. IEEE EMBS (2007), 5276–5281. [10] Pantic, M., and Rothkrantz, L. Automatic Analysis of Facial Expressions: The State of the Art. IEEE PAMI 22, 12 (2000), 1424–1445. [11] Rothwell, S., Lehane, B., Chan, C., Smeaton, A., OConnor, N., Jones, G., and Diamond, D. The CDVPlex biometric cinema: sensing physiological responses to emotional stimuli in film. In ICPA, Citeseer (2006), 1–4. [12] Russell, J. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 6 (1980), 1161–1178. [13] Soleymani, M., Pantic, M., and Pun, T. Multi-modal emotion recognition in response to videos. IEEE Trans. on Affective Computing 3, 2 (2011), 211–223.