A Spatio-Temporal Approach for Video Text Detection and Tracking in Complex Background
Bo-Da Lin (林柏達)
中華民國 96 年 7 月


  Video text detection in unconstrained environment is a great challenge due to arbitrary color, size, orientation and low contrast of text and background. This paper proposes a novel method that not only utilizes texture analysis in spatial domain, but also incorporates temporal information in time domain, to tackle the challenging problem. A 3D wavelet transform is first proposed to filter out high and low frequencies in both spatial and temporal domain. Statistical feature of text are extracted from the filter sub-bands. A Gaussian mixture Bayesian network is derived to classify text regions through the statistical features. Text tracking is achieved by a modified particle filter, which tracks text feature of kernel edge orientation histogram. In our experimental data, it had static text and dynamic text in complex background and also had scene text and graphic text that contain English, Japan and Chinese text. We detect text for each frame with 97.85% recall rate and 94.04% precision rate. Combining tracking method with the detection method can improve 33% efficiency, with 93.54% recall rate and 90.10% precision rate.



   As text edges are textural feature, we propose a spatio-temporal wavelet transform method to find edge intensity from image sequences. Then we find textural features by gray level co-occurrence matrix (GLCM) with four directions. We recognize text blocks by a novel Gaussian mixture model (GMM) classifier. A small window (8×8) scanned left to right, top to down with four pixels overlap, because more precise boundary of text could be extracted. In the tracking part, we track text regions for every five frames with particle filter because sometimes texts suddenly appear. We predict position of text regions by particle filter and utilized kernel edge orientation histogram to decide weight in particle filter.