Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA
https://doi.org/10.1145/3487042

Recently, many Visual Question Answering (VQA) models rely on the correlations between questions and answers yet neglect those between the visual information and the textual information. They would perform badly if the handled data distribute differently ...

Interactive Re-ranking via Object Entropy-Guided Question Answering for Cross-Modal Image Retrieval
https://doi.org/10.1145/3485042

Cross-modal image-retrieval methods retrieve desired images from a query text by learning relationships between texts and images. Such a retrieval approach is one of the most effective ways of achieving the easiness of query preparation. Recent cross-...

Shuffle-invariant Network for Action Recognition in Videos
https://doi.org/10.1145/3485665

The local key features in video are important for improving the accuracy of human action recognition. However, most end-to-end methods focus on global feature learning from videos, while few works consider the enhancement of the local information in a ...

Learning Adaptive Spatial-Temporal Context-Aware Correlation Filters for UAV Tracking
https://doi.org/10.1145/3486678

Tracking in the unmanned aerial vehicle (UAV) scenarios is one of the main components of target-tracking tasks. Different from the target-tracking task in the general scenarios, the target-tracking task in the UAV scenarios is very challenging because of ...

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept
https://doi.org/10.1145/3491224

Reconstructing three-dimensional (3D) objects from images has attracted increasing attention due to its wide applications in computer vision and robotic tasks. Despite the promising progress of recent deep learning–based approaches, which directly ...

Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation
https://doi.org/10.1145/3487194

Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in ...

Objective Object Segmentation Visual Quality Evaluation: Quality Measure and Pooling Method
https://doi.org/10.1145/3491229

Objective object segmentation visual quality evaluation is an emergent member of the visual quality assessment family. It aims to develop an objective measure instead of a subjective survey to evaluate the object segmentation quality in agreement with ...

CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement
https://doi.org/10.1145/3488719

Dense stereo matching estimates the depth for each pixel of the referenced images. Recently, deep learning algorithms have dramatically promoted the development of stereo matching. The state-of-the-art result is achieved by models adopting deep ...

Recognizing Gaits Across Walking and Running Speeds
https://doi.org/10.1145/3488715

For decades, very few methods were proposed for cross-mode (i.e., walking vs. running) gait recognition. Thus, it remains largely unexplored regarding how to recognize persons by the way they walk and run. Existing cross-mode methods handle the walking-...

Inner Knowledge-based Img2Doc Scheme for Visual Question Answering
https://doi.org/10.1145/3489142

Visual Question Answering (VQA) is a research topic of significant interest at the intersection of computer vision and natural language understanding. Recent research indicates that attributes and knowledge can effectively improve performance for both ...

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach
https://doi.org/10.1145/3490033

In this article, we present an approach for retrieving similar faces between the artistic and the real domain. The application we refer to is an interactive exhibition inside a museum, in which a visitor can take a photo of himself and search for a ...

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals
https://doi.org/10.1145/3490686

Considerable attention has been paid to physiological signal-based emotion recognition in the field of affective computing. For reliability and user-friendly acquisition, electrodermal activity (EDA) has a great advantage in practical applications. ...

GraSP: Local Grassmannian Spatio-Temporal Patterns for Unsupervised Pose Sequence Recognition
https://doi.org/10.1145/3491227

Many applications of action recognition, especially broad domains like surveillance or anomaly-detection, favor unsupervised methods considering that exhaustive labeling of actions is not possible. However, very limited work has happened in this domain. ...

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition
https://doi.org/10.1145/3491228

Action recognition has been a heated topic in computer vision for its wide application in vision systems. Previous approaches achieve improvement by fusing the modalities of the skeleton sequence and RGB video. However, such methods pose a dilemma between ...

Distributed Gateway Selection for Video Streaming in VANET Using IP Multicast
https://doi.org/10.1145/3491388

The volume of video traffic as infotainment service over vehicular ad hoc network (VANET) has rapidly increased for past few years. Providing video streaming as VANET infotainment service is very challenging because of high mobility and heterogeneity of ...

Multilayer Video Encoding for QoS Managing of Video Streaming in VANET Environment
https://doi.org/10.1145/3491433

Efficient delivery and maintenance of the quality of service (QoS) of audio/video streams transmitted over VANETs for mobile and heterogeneous nodes are one of the major challenges in the convergence of this network type and these services. In this ...

When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization
https://doi.org/10.1145/3492325

Image captioning for low-resource languages has attracted much attention recently. Researchers propose to augment the low-resource caption dataset into (image, rich-resource language, and low-resource language) triplets and develop the dual attention ...

Improving Crowd Density Estimation by Fusing Aerial Images and Radio Signals
https://doi.org/10.1145/3492346

A recent line of research focuses on crowd density estimation from RGB images for a variety of applications, for example, surveillance and traffic flow control. The performance drops dramatically for low-quality images, such as occlusion, or poor light ...

A Format-compatible Searchable Encryption Scheme for JPEG Images Using Bag-of-words
https://doi.org/10.1145/3492705

The development of cloud computing attracts enterprises and individuals to outsource their data, such as images, to the cloud server. However, direct outsourcing causes the extensive concern of privacy leakage, as images often contain rich sensitive ...

Blockchain-Based Audio Watermarking Technique for Multimedia Copyright Protection in Distribution Networks
https://doi.org/10.1145/3492803

Copyright protection in multimedia protection distribution is a challenging problem. To protect multimedia data, many watermarking methods have been proposed in the literature. However, most of them cannot be used effectively in a multimedia distribution ...

Deep Illumination-Enhanced Face Super-Resolution Network for Low-Light Images
https://doi.org/10.1145/3495258

Face images are typically a key component in the fields of security and criminal investigation. However, due to lighting and shooting angles, faces taken under low-light conditions are often difficult to recognize. Face super-resolution (FSR) technology ...

Scribble-Supervised Meibomian Glands Segmentation in Infrared Images
https://doi.org/10.1145/3497747

Infrared imaging is currently the most effective clinical method to evaluate the morphology of the meibomian glands (MGs) in patients. As an important indicator for monitoring the development of MG dysfunction, it is necessary to accurately measure gland-...

Towards Integrating Image Encryption with Compression: A Survey
https://doi.org/10.1145/3498342

As digital images are consistently generated and transmitted online, the unauthorized utilization of these images is an increasing concern that has a significant impact on both security and privacy issues; additionally, the representation of digital ...


