Rui Feng fengrui128

hey, i'm rui

grad student at UWA (Perth, Australia) — working on multimodal learning, mostly vision-language models and why they confidently describe things that aren't there.

my research sits somewhere between computer vision and NLP. right now i'm spending most of my time thinking about:

hallucination in VLMs — when does a model "see" something vs. invent it? how do we measure that reliably?
cross-modal attention — what's actually happening when a model aligns a word with a region in an image?
visual reasoning chains — can we evaluate whether a model's intermediate steps are grounded, not just the final answer?

current status: debugging something that was working yesterday

some things i've built / been building:

project	what it does
`vlm-hallu-probe`	lightweight toolkit for probing hallucination patterns in VLMs — object, attribute, relation levels
`attn-scope`	attention map analysis + visualization for multimodal transformers, because looking at attention weights is half of debugging
`visual-cot-eval`	evaluating whether visual chain-of-thought reasoning steps are actually grounded in the image

recently been reading / thinking about:

the gap between automated VQA benchmarks and what "understanding" actually means
token merging strategies in ViTs and whether they hurt grounding
whether RLHF-aligned models are more or less prone to hallucination (the answer is complicated)

not much else to say here. i mostly keep notes in obsidian, break things in jupyter notebooks, and occasionally remember to commit my experiments before losing them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rui Feng fengrui128

Achievements

Achievements

Block or report fengrui128

hey, i'm rui

Popular repositories Loading

Uh oh!