I am an undergraduate student at Sichuan
University(SCU),majoring in Computer Science & technology. My
Major GPA (CS cources):
3.79/4, 89.39/100;
Overall GPA: 3.78/4, 89.25/100
In the Sichuan University, I am working as a research assistant at
MachineILab
since 2022, advised by
Prof. JiZhe Zhou.
I am participating in one National Natural Science Foundation of China and one National Key R&D
Program of China.
My research areas include recommendation systems(RS), large language models(LLM), computer
vision(CV), and graph neural networks(GNN).
My previous research was primarily focused on topics within computer vision, such as tampering
detection and object recognition tasks.
Currently, I am diving into the realms of large language models(LLM) and recommendation systems(RS).
News
2024-09-01 - beginning my pursuit of a PhD degree.
2024-06-26 - graduated from Sichuan University with a bachelor's degree.
IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection &
Localization
[arxiv]
Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang,
Chi-Man Pun, Jiancheng Lv, Jizhe Zhou
A comprehensive benchmark is yet to be established in the Image Manipulation Detection &
Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading
model evaluations, severely undermining the development of this field. However, the scarcity of
open-sourced baseline models and inconsistent training and evaluation protocols make conducting
rigorous experiments and faithful comparisons among IMDL models challenging. To address these
challenges, we introduce IMDL-BenCo, the first comprehensive IMDL benchmark and modular codebase.
IMDL-BenCo:i) decomposes the IMDL framework into standardized, reusable components and
revises the model construction pipeline, improving coding efficiency and customization
flexibility;ii) fully implements or incorporates training code for state-of-the-art models
to establish a comprehensive IMDL benchmark; and iii) conducts deep analysis based on the
established benchmark and codebase, offering new insights into IMDL model architecture, dataset
characteristics, and evaluation standards. Specifically, IMDL-BenCo includes common processing
algorithms, 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of
standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of
robustness evaluation. This benchmark and codebase represent a significant leap forward in
calibrating the current progress in the IMDL field and inspiring future breakthroughs.
Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph
Reasoning
[arxiv]
Zhuohang Jiang, Bingkui Tong, Xia Du, Ahmed Alhammadi, Jizhe Zhou
To explicitly derive the objects' privacy class from the scene contexts, in this paper, we interpret
the POI task as a visual reasoning task aimed at the privacy of each object in the scene. Following
this interpretation, we propose the PrivacyGuard framework for POI. PrivacyGuard contains three
stages. i) Structuring: an unstructured image is first converted into a structured, heterogeneous
scene graph that embeds rich scene contexts. ii) Data Augmentation: a contextual perturbation
oversampling strategy is proposed to create slightly perturbed privacy-sensitive objects in a scene
graph, thereby balancing the skewed distribution of privacy classes. iii) Hybrid Graph Generation &
Reasoning: the balanced, heterogeneous scene graph is then transformed into a hybrid graph by
endowing it with extra "node-node" and "edge-edge" homogeneous paths. These homogeneous paths allow
direct message passing between nodes or edges, thereby accelerating reasoning and facilitating the
capturing of subtle context changes.
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
[arxiv]
Xiaochen Ma, Bo Du,Zhuohang Jiang, Ahmed Y. Al Hammadi, Jizhe Zhou
Due to limited datasets, there is currently no pure ViT-based approach for IML to serve as a
benchmark, and CNNs dominate the entire task. Nevertheless, CNNs suffer from weak long-range and
non-semantic modeling. To bridge this gap, based on the fact that artifacts are sensitive to image
resolution, amplified under multi-scale features, and massive at the manipulation border, we
formulate the answer to the former question as building a ViT with high-resolution capacity,
multi-scale feature extraction capability, and manipulation edge supervision that could converge
with a small amount of data. We term this simple but effective ViT paradigm IML-ViT, which has
significant potential to become a new benchmark for IML. Extensive experiments on five benchmark
datasets verified our model outperforms the state-of-the-art manipulation localization methods.
Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing
on Low-level Features [arxiv]
Xiaochen Ma, Zhuohang Jiang, Xiong Xu, Chi-Man Pun, Jizhe Zhou
This necessitates IML models to carry out a semantic understanding of the entire
image. In this paper, we
reformulate the IML task as a high‑level
vision task that greatly benefits from low‑level features. We propose a method to enhance the Masked
Autoencoder (MAE) by incorporating
high‑resolution inputs and a perceptual loss supervision module, which we term Perceptual MAE
(PMAE). While
MAE has demonstrated an
impressive understanding of object semantics, PMAE can also comprehend low‑level semantics with our
proposed
enhancements. This
paradigm effectively unites the low‑level and high‑level features of the IML task and outperforms
state‑of‑the‑art tampering localization methods
on five publicly available datasets, as evidenced by extensive experiments.
Contour‑Aware Contrastive Learning for Image Manipulation Localization
Qin Li, Chunfang Yu, Zhuohang Jiang, AND Jizhe Zhou
We propose a novel Contour‑aware Contrastive Learning Network (CaCL‑Net) based on
the encoder‑decoder architecture. On the encoder side,
since the contour is foremost concerned in IML, we consider the image patches sampled along the
manipulation contour are the hard examples
and set them as the anchor. The patches of pure tampered and authentic pixels are set as positives
and negatives respectively to conduct
contrastive learning. The decoder then manages to specify the manipulated regions and restores the
explicit contours of the manipulations
through the proposed Contour Binary Cross‑Entropy (CBCE) loss.
Research Projects
Research on Scene Graph Structure Learning Method for Private Object Detection
Advisor: Jizhe Zhou
Participate as an intern National Natural Science Foundation of China , 2024
The privacy-sensitive object detection problem reauires the model to locate private
objects in bounding boxes on images or videos. Research on privacy-sensitive object detection has
imnortant value for personal-privacyprotection. Privacy-sensitive ob ject detection is actually a
scene reasoning problem. However existing privacy-sensitive oh iect detection methods are all
basedon the object detection framework.Due to the lack of scene reasoning ability,existing methods
suffer from detection accuracy,generalizability,and interpretability.This project intends to build a
set of privacy-sensitive objectdetection methods with scene reasoning capability through scene
graphs. Unlikeother tasks, privacy-sensitive object detection requires a non-parametric scenegraph
structure to keep the graph sparse,dynamic,and interpretable.Therefore,this project correspondingly
proposes the scene graph structure learning methods.By studying 1) the distillation method of the
graph structure to sparse the scenegraph,2) the transferring method between the scene graphs of
different frames tomake the scene graph dynamic,3) the privacy-rule reasoning method based on
thescene graph structure,solves the problem of scene graph generation with thenon-parametric graph
structure, builds a new privacy-sensitive object detectionframework based on scene reasoning, break
the bottlenecks of privacy-sensitiveobject detection methods in accuracy,generalizability,and
interpretability.Thisprivacy-sensitive object detection framework also enriched the theoretical
framework and application scenarios of neural networks.
Intelligent control and full life feedforward deduction technology through pre planning
and post evaluation
Advisor: Jizhe Zhou
Participate as an intern National Key R&D
Program of China, 2023
This topic proposes to adopt the scheme of "first completing information, then path
reasoning". Specifically, this sub project intends to improve the initial network diagram based on
the data base established in previous projects and further consider the frequency of common
occurrence of impact factors. Then, based on the external knowledge causal information completion
method of remote supervision, the network graph is again completed and cleaned, and then a path
reasoning algorithm based on depth first traversal of the graph is used to achieve feedforward
reasoning. Finally, a human-computer interactive network information verification method based on
uncertainty reasoning is used to revise the causal relationship in the network diagram again based
on artificial feedback on the reasoning path, further improving the accuracy and recall rate of the
path reasoning algorithm results.
Education
Sichuan University, Chengdu, Sichuan, China
B.E. in Computer Science and Technology • Sep. 2020 to Jun. 2024
Hong Kong Polytechnic University, Hongkong, China
PHD. in Computer Science and Technology • Sep. 2024 to Present
Covariant association, Sichuan University
the president of the covariant association • Sep. 2022 to Jun. 2023
the Covariant association obtain at least 400+
members, in which communicate the technology of the
computer science.
Award
Computer Design Competition China, 2023
Provincial First Prize
Comprehensive First Class Scholarship Sichuan University, Sichuan, China, 2022
Top 1%
Outstanding students of Sichuan University Sichuan University, Sichuan, China, 2022