Projects
A selection of my projects on HCI and AI.
DocHub: Facilitating Comprehension of Documents via Structured Sensemaking with Large Language Models
We propose DocHub, a LLM-based interactive system that (1) identifies and visualizes crucial data and their interconnections within documents as node-link diagrams, (2) offers an interactive interface allowing users to modify these visualizations for tailored insights and to pose detailed, context-specific queries for deeper understanding, and (3) features a non-linear abstraction framework to adeptly handle and streamline the complexity of information presented.
Who's Hated: Detecting and Analyzing the Entities Targeted by Hateful Memes
Hateful memes pose a significant threat to the well-being of online communities. Therefore, developing automated systems for the detection and analysis of hateful memes is crucial to mitigate their adverse impact; nonetheless, it is an intrinsically difficult and open problem: memes convey messages using both images and texts and, hence, require multimodal reasoning. While previous research has examined similar problems, they are quite limited; a holistic approach is lacking, particularly in terms of reasoning about the target entities. Moreover, there is little analysis that clarifies why certain entities are more susceptible, and no suggested measures have been put forth to specifically curb the dissemination of hateful memes. In this study, we aim to address these issues.
Hands-Free Is Fine: Gaze-Dominant Object Manipulation in Virtual Reality
Efficient object manipulation is critical to VR interaction, and hands-free is a method worth discussing. We introduce a hands-free gaze-dominant manipulating pipeline that significantly outperforms the current state-of-the-art methods.
CrossKeys: Text Entry for Virtual Reality Using a Single Controller via Wrist Rotation
Text entry is indispensable; notwithstanding, in virtual reality, an efficient andhandy text entry method is still wanted. Therefore, we propose an innovative one-handed text entry method to achieve a faster speed, higher accuracy, and better user experience.
Temporal Transformer Networks With Self-Supervision for Action Recognition.
Cross-Attention ReID, a state-of-the-art approach to realizing pedestrians' re-identification based on training with large-scale datasets generated by single-channeled IR cameras and three-channeled RGB cameras.
Cat Breeds Classifier
I love cats 😍 🐈! A simple CNN (Convolutional Neural Network) to classify different cat breeds. Built with PyTorch 🔥.