Moodifier: MLLM-Enhanced Emotion-Driven Image Editing

Human Validation Study

To evaluate the quality and reliability of our MoodArchive dataset, we conducted a comprehensive human validation study on Amazon Mechanical Turk. Workers were tasked with comparing original web-collected alt-text with our LLaVA-generated detailed captions, assessing both content accuracy and emotional interpretation.

MTurk Validation Interface

Task Instructions: Comparing Web-collected Captions vs. LLaVA Emotional Descriptions

Figure 5: MTurk task instructions detailing the evaluation criteria for comparing captions.

Evaluation Interface: Original vs. LLaVA-generated Emotional Captions

Figure 6: The comparison interface showing image, original caption, and LLaVA-generated emotional caption.

Validation Results

85% of LLaVA-generated captions were selected by workers as better describing the images than the original web-collected alt-text, confirming the high quality of our automated annotation approach.

Common Rejection Reasons (15% of captions):

Inappropriate word choices or overly dramatic descriptions
Inaccurate emotion detection or classification
Misalignment between detected emotions and cultural interpretations

Moodifier: MLLM-Enhanced Emotion-Driven Image Editing

Human Validation Study

MTurk Validation Interface

Validation Results

Common Rejection Reasons (15% of captions):

BibTeX

Moodifier:
MLLM-Enhanced Emotion-Driven Image Editing