Skip to content
View fengrui128's full-sized avatar
🌴
On vacation
🌴
On vacation
  • University of Western Australia
  • Perth, WA
  • 12:57 (UTC +08:00)

Block or report fengrui128

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
fengrui128/README.md

hey, i'm rui

grad student at UWA (Perth, Australia) — working on multimodal learning, mostly vision-language models and why they confidently describe things that aren't there.

my research sits somewhere between computer vision and NLP. right now i'm spending most of my time thinking about:

  • hallucination in VLMs — when does a model "see" something vs. invent it? how do we measure that reliably?
  • cross-modal attention — what's actually happening when a model aligns a word with a region in an image?
  • visual reasoning chains — can we evaluate whether a model's intermediate steps are grounded, not just the final answer?

current status: debugging something that was working yesterday

some things i've built / been building:

project what it does
vlm-hallu-probe lightweight toolkit for probing hallucination patterns in VLMs — object, attribute, relation levels
attn-scope attention map analysis + visualization for multimodal transformers, because looking at attention weights is half of debugging
visual-cot-eval evaluating whether visual chain-of-thought reasoning steps are actually grounded in the image

recently been reading / thinking about:

  • the gap between automated VQA benchmarks and what "understanding" actually means
  • token merging strategies in ViTs and whether they hurt grounding
  • whether RLHF-aligned models are more or less prone to hallucination (the answer is complicated)

not much else to say here. i mostly keep notes in obsidian, break things in jupyter notebooks, and occasionally remember to commit my experiments before losing them.

Popular repositories Loading

  1. fengrui128 fengrui128 Public

  2. visual-cot-eval visual-cot-eval Public

    Faithfulness evaluation for visual chain-of-thought reasoning in VLMs — are the reasoning steps actually grounded?

    Python

  3. attn-scope attn-scope Public

    Attention map analysis and visualization for multimodal transformers — cross-attention, self-attention rollout, head importance

    Python

  4. vlm-hallu-probe vlm-hallu-probe Public

    A lightweight toolkit for probing hallucination patterns in vision-language models — object, attribute, and relation levels

    Python

  5. waverless waverless Public

    Forked from WaveSpeedAI/waverless

    Go

  6. OmniAgent OmniAgent Public

    Forked from YeQing17-2026/OmniAgent

    An agent capable of self-evolving and dynamically hardening security

    Python