Case Study: Can AI Take a Joke—Or Make One?

Exploring humor in AI-powered support systems
Research | UX Strategy | Conversational Design | NLP

Role: UX Researcher & Chatbot Designer
Team: Pavithra Ramakrishnan, Kexin Quan, Dr. Jessie Chin
Time: Aug 2023 – Present
Conference: ACM Creativity & Cognition 2025 (Poster + Publication)

https://dl.acm.org/doi/10.1145/3698061.3734388

Overview

Can AI be funny—and emotionally supportive?
Humor is one of the most human tools for connection, yet it’s incredibly difficult for machines to get right, especially in emotionally sensitive conversations. We explored whether large language models (LLMs) like ChatGPT, LLaMA3, and Gemini can generate and recognize humor in a way that’s emotionally appropriate, context-aware, and helpful in support-focused interactions.

🔍 The Challenge

In peer coaching, therapy, and academic advising contexts, humor can help ease tension, but if misused, it risks alienating users or eroding trust.
We asked:

Can LLMs understand when and how to use humor in supportive interactions like a friend or a counselor?

🎯 Project Goals

  • Build and evaluate 6 chatbot personas with different humor styles

  • Test how well LLMs generate, classify, and respond to humor in support scenarios

  • Publish results at a leading HCI conference (ACM C&C 2025)


My Role

  • Designed all 6 chatbot personas (Google Dialogflow + Slack)

  • Developed evaluation metrics (tone alignment, emotional fit, humor strength)

  • Led UX research tasks including data labeling, user testing, and prompt tuning

  • Co-authored the peer-reviewed ACM paper on Overleaf

  • Designed the conference poster using Microsoft PPT + Adobe Illustrator

  • Preparing to be a speaking author at ACM Creativity & Cognition 2025

    Skills: UX Research · NLP · Conversational Design · Figma · Illustrator · Academic Writing · Python APIs

💡 Research Approach

💡 Research Approach

Task 1

Humor Generation

We prompted GPT-4o, Gemini 1.5, and LLaMA3 to generate messages using three humor styles:

  • Affiliative (friendly, uplifting)

  • Self-Defeating (vulnerable, relatable)

  • No Joke (emotionally direct)

Example prompt → “window, view, weather”
Response (Self-Defeating): “I always dreamed of a beautiful window view… now I just stare at rain and regret.”

60 human raters evaluated AI responses for:

  • Humor Style Accuracy

  • Emotional Appropriateness

  • Humor Strength

Task 2

Humor Recognition

We asked the same LLMs to classify 20 human-written supportive statements, varying both:

  • Humor Style (Affiliative, Self-Defeating, No Joke)

  • Speaker Role (Counselor vs. Friend)

This tested whether models could not only be funny but also understand the context in which humor appears.

Tools & Tech

Google Dialogflow + Slack – Chatbot deployment

  • Figma + Illustrator – Visual design for poster

  • Prompt engineering + classification

  • Google Sheets – Rating collection and analysis

Meet the Slack Chatbots

Designed and developed 6 Emotionally Intelligent Chatbots (EICBs) that combined two support rolesfriend and counselor—with three distinct humor styles: no joke, affiliative, and self-deprecating. Built using Google Dialogflow and integrated on Slack, the chatbots were used to simulate emotionally supportive conversations in college student contexts.

📊 Outcomes

Humor Generation

  • GPT-4o was most accurate in tone and emotional fit

  • Affiliative humor was rated the most emotionally resonant, yet hardest to generate

  • LLaMA3 showed the most variability

  • Gemini struggled with subtle emotional humor

Humor Recognition

  • GPT-4o correctly classified humor styles more often than others

  • LLaMA3 did better at identifying roles (Counselor vs. Friend)

  • All models blurred tone between formal vs. casual roles, risking misalignment

Insight: LLMs often confuse casual peer support with formal counseling, posing ethical and trust-related design risks in sensitive applications.

Poster Preview

Accepted to ACM C&C 2025
Designed in Adobe Illustrator, the poster presents all rating metrics and model comparisons visually.

📄 View Published Paper (PDF) →

💬 Reflections