Existing approaches usually form the tracking task as an appearance matching procedure. However, the discrimination ability of appearance features is insufficient in these trackers, which is caused by their weak feature supervision constraints and ...
The visual quality of photographs taken under imperfect lightness conditions can be degenerated by multiple factors, e.g., low lightness, imaging noise, color distortion, and so on. Current low-light image enhancement models focus on the improvement of ...
Visual attention in Visual Question Answering (VQA) targets at locating the right image regions regarding the answer prediction, offering a powerful technique to promote multi-modal understanding. However, recent studies have pointed out that the ...
Recently, AI-manipulated face techniques have developed rapidly and constantly, which has raised new security issues in society. Although existing detection methods consider different categories of fake faces, the performance on detecting the fake faces ...
Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing image-text retrieval methods can be classified into ...
Financial markets have always been a point of interest for automated systems. Due to their complex nature, financial algorithms and fintech frameworks require vast amounts of data to accurately respond to market fluctuations. This data availability is ...
In person re-identification (Re-ID), the data annotation cost of supervised learning, is huge and it cannot adapt well to complex situations. Therefore, compared with supervised deep learning methods, unsupervised methods are more in line with actual ...
Recently, advances in person re-identification (Re-ID) has benefitted from use of the popular multi-branch network. However, performing feature learning in a single branch with uniform partitioning is likely to separate meaningful local regions, and ...
Great progress has been achieved in domain adaptation in decades. Existing works are always based on an ideal assumption that testing target domains are independent and identically distributed with training target domains. However, due to unpredictable ...
Ultra High-Definition (UHD) and Virtual Reality (VR) video streaming over 5G networks are emerging, in which High-Efficiency Video Coding (HEVC) is used as source coding to compress videos more efficiently and polar code is used as channel coding to ...
The human–computer conversation system is a significant application in the field of multimedia. To select an appropriate response, retrieval-based systems model the matching between the dialogue history and response candidates. However, most of the ...
Recently, deep learning for video coding, such as deep predictive coding, deep transform coding, and deep in-loop filtering, has been an emerging research area. The coding gain of hybrid coding framework could be extensively promoted by the data-driven ...
The performance of human pose estimation depends on the spatial accuracy of keypoint localization. Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, ...
Fully mining visual cues to aid in content understanding is crucial for video captioning. However, most state-of-the-art video captioning methods are limited to generating captions purely based on straightforward information while ignoring the scenario ...
The surround-view module is an indispensable component of a modern advanced driving assistance system. By calibrating the intrinsics and extrinsics of the surround-view cameras accurately, a top-down surround-view can be generated from raw fisheye images. ...
Neural style transfer has been developed in recent years, where both performance and efficiency have been greatly improved. However, most existing methods do not transfer the brushstrokes information of style images well. In this article, we address this ...
The pansharpening is a combination of multispectral (MS) and panchromatic (PAN) images that produce a high-spatial-spectral-resolution MS images. In multiresolution analysis–based pansharpening schemes, some spatial and spectral distortions are found. It ...
The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay ...
Recovering the geometry of an object from a single depth image is an interesting yet challenging problem. While previous learning based approaches have demonstrated promising performance, they don’t fully explore spatial relationships of objects, which ...
With the widespread use of smartphones and the rise of intelligent software, we can manipulate captured photos anytime and anywhere, so the fake photos finally obtained look “Real.” If these intelligent operation methods are maliciously applied to our ...
Deep learning models are widely used in daily life, which bring great convenience to our lives, but they are vulnerable to attacks. How to build an attack system with strong generalization ability to test the robustness of deep learning systems is a hot ...
Real-time ultra-high-definition (UHD) video applications have attracted much attention, where the encoder side urgently demands the high-throughput two-dimensional (2D) transform hardware implementation for the latest video coding standards. This article ...
Currently Not Available