U24 – Comparing ChatGPT and Human Raters in Creative Idea Evaluation – Discover Science

Seth Vallee, Delaney Chiarelli, Mikayla Luskey, Ava Rico / Psychology / Faculty Mentor: Logan Watts

Artificial Intelligence (AI) software is becoming widely used across various career sectors. In psychology research, AI technology can serve as a powerful tool for time-consuming tasks, such as coding complex data. This study evaluates ChatGPT, a popular AI tool, for its accuracy in rating creative ideas by comparing its scores to those of human raters. Data from a previous creativity study were used, where each idea was evaluated by human raters on a 5-point Likert scale across three dimensions: originality, usefulness, and elegance. Both the free and plus versions of ChatGPT were prompted to follow the same instructions given to human raters to evaluate each idea. The analysis showed significant differences between human ratings and both versions of ChatGPT for all three dimensions. Additionally, a significant difference was found between the free and plus versions in the usefulness dimension. On average, human raters provided extreme scores (either 1 or 5) more frequently than ChatGPT. Although AI could improve efficiency in coding qualitative data, concerns remain about the validity of AI ratings. AI ratings may not fully align with human ratings, possibly due to AI’s limited understanding of context. Future research should explore AI training and prompt engineering to enhance accuracy.

Poster

Discover Research Poster (1)_Seth Vallee Download

U24 – Comparing ChatGPT and Human Raters in Creative Idea Evaluation

Poster

Video Presentation

Leave a Reply Cancel reply