Multimedia Sensor Fusion for Intelligent Camera Control and Human-Computer Interaction
Steven George Goodridge
Table of Contents
ABSTRACT
ABOUT THE AUTHOR
ACKNOWLEDGEMENTS
LIST OF TABLES
LIST OF FIGURES
1 INTRODUCTION
1.1 OBJECTIVES
1.2 SIGNIFICANCE OF THE RESEARCH
1.3 INSPIRATION
1.4 FEASIBILITY
1.5 APPROACH
2 RELATED WORK
2.1 MULTIMODAL SYSTEMS
2.1.1 Speech Recognition
2.1.2 Face Detection
2.1.3 Face Recognition
2.1.4 Facial Expression Recognition and Lip Reading
2.1.5 Body Tracking and Gesture Recognition
2.1.6 Sound Localization
2.1.7 Audio-Visual Fusion
2.2 SENSOR FUSION
2.2.1 Levels of Representation in Sensor Fusion
2.2.2 Computational Methods
3 SOUND LOCALIZATION
3.1 SPATIAL HEARING IN HUMANS AND ANIMALS
3.1.1 Interaural Intensity Difference and Interaural Time Difference
3.1.2 Head-Related Transfer Function
3.1.3 Precedence Effect
3.1.4 Sound Localization in the Barn Owl
3.2 TECHNIQUES FOR CALCULATION OF INTER-AURAL TIME DELAY
3.2.1 Cross-Correlation
3.2.2 Inter-Aural Phase Delay
3.2.3 Onset, or Envelope, Delay
3.2.4 Systems Combining Multiple Methods
3.3 AN ONSET SIGNAL CORRELATION ALGORITHM
3.3.1 The Algorithm
3.3.2 Selection of Envelope Decay Coefficient
3.3.3 Effects of Onset Processing on a Speech Signal
3.4 EXPERIMENTAL RESULTS
3.4.1 Simulation
3.4.2 Room Experiments
3.4.3 Performance with Multiple People Speaking
4 PRIMITIVE VISION
4.1 MOTION DETECTION
4.1.1 Color Image Difference
4.1.2 Background Image Subtraction
4.2 FACE DETECTION
4.2.1 Skin Tone Detection
4.2.2 Experimental Results
5 AUDIO-VISUAL SENSOR FUSION FOR FACE DETECTION
5.1 DETECTING NOISY FACE PIXELS
5.2 COMMON COORDINATES
5.3 A BAYESIAN MULTIMODAL PIXEL CLASSIFIER
5.4 PERFORMANCE BENEFIT
6 TARGET TRACKING
6.1 COORDINATE TRANSFORMATION
6.2 DATA ASSOCIATION
6.3 KALMAN FILTER
6.3.1 Modeling the Target
6.3.2 Estimation Update
7 BEHAVIOR FUSION
7.1 MULTI-AGENT REACTIVE CONTROL
7.2 FUZZY CONTROL BEHAVIORS
7.2.1 Evaluation of Fuzzy Rules
7.2.2 When Centroid Defuzzification Fails
7.2.3 Effects of a Large Input Space
7.3 ARBITRATION MECHANISMS
8 GENERIC CAMERA BEHAVIORS
8.1 TYPES OF VISION MOVEMENTS
8.2 RESOURCE CONTENTION
8.3 CONTROL SYSTEM INPUTS AND OUTPUTS
8.4 FOLLOWING A SOUND
8.5 FOLLOWING A TALKING HEAD
8.6 FOLLOWING A MOVING TARGET
9 APPLICATIONS
9.1 VIDEOCONFERENCING
9.1.1 Fusion of Behaviors
9.1.2 Experimental Results
9.2 SURVEILLANCE
9.2.1 Fusion of Behaviors
9.2.2 Experimental Results
9.3 FUTURE APPLICATIONS
9.3.1 Speech Association
9.3.2 Identity Tracking
9.3.3 Intelligent Room Control
10 CONCLUSIONS
10.1 CONTRIBUTIONS
10.2 LIMITATIONS
10.3 OBSTACLES TO PERFORMANCE TESTING
10.4 RECOMMENDATIONS FOR FUTURE WORK
REFERENCES
FOOTNOTES
APPENDIX A: SOUND EXPERIMENT DATA
APPENDIX B: TIPS ON IMPLEMENTATION