Why Jimmy Can Prompt...Sometimes
TLDR:
Prompt writing is challenging, especially for non-AI experts, as highlighted by a study inspired by the Bon Appétit show Back to Back Chef
Common issues include confusion about where to start, understanding AI's capabilities, and difficulty in crafting effective prompts
OpenAI provides six strategies for better results: clear instructions, reference text, simplifying tasks, allowing "thinking" time, using external tools, and systematic testing
Additional recommendations from MIT, Harvard, and Wharton emphasize context, specificity, audience consideration, and iterative interaction with AI
My strategies include iterating on prompts, showing (not telling), asking for reasoning, and observing stylistic success in different contexts
Almost everyone I know in business school has used ChatGPT. But prompt writing, or asking a question to an AI dialogue system to receive an answer, is actually pretty challenging for a lot of reasons.
One of my favorite human-centered computing papers, “Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts'' speaks to this point. Inspired by the Bon Appétit show Back to Back Chef, researchers in this paper instructed a group of individuals to design a chatbot that instructs a user on a given recipe. None of the individuals in the group had prior prompt experience, though all participants came from STEM related fields.
In trying to design the instructional chatbot, several common challenges arose throughout the interviews. Participants reported confusion knowing where to start and difficulty understanding the capabilities and limitations of the chatbot. They were unclear how to prompt the right instructions associated with the cooking recipe, combined commands directed at the bot with commands directed at end users in the same prompt, and at times politely asked the bot to “please” accomplish “x” task. Participants were able to generate prompts with mixed success, but had even more difficulty evaluating and then updating their prompts to accomplish the intended outcome.
There were two fundamental issues observed during the interviews: participants (incorrectly) simplified prompt design based on a few outputs, and wrote prompts based on human to human interactions. Simply, they made assumptions about LLM capabilities that were not accurate, and input instruction grounded in human understanding.
So, how do you get better at prompt writing?
Let’s start with OpenAI who released a prompt engineering guide “for getting better results from large language models”. They provided six strategies and associated tactics for getting better results -
Write clear instructions
Provide reference text
Split complex tasks into simpler subtasks
Give the model time to “think”
Use external tools
Test changes systematically
- and in aggregate I think the recommendations are good, but have a few critiques. I’d rather have a right answer and wait a few more moments than have a wrong, hallucinated answer; it seems strange that at times you may need to explicitly ask the model for a chain of thought (i.e. a sequential reasoning process), as they mention that the model makes reasoning errors when trying to answer right away. I also don’t believe that OpenAI provides users with the ability to test prompt changes systematically within ChatGPT. Changing one word or phrase has the potential to dramatically alter output, and as of now there’s no straightforward process to understand how changing “x” updates “y”, or an easy way to control for multiple updates on a given prompt.
Research from MIT, Harvard, and Wharton also have helpful prompt nuggets. MIT advocates that in order to write effective prompts, the user ought to provide context, be specific, and iterate on previous prompts. When it comes to providing context, the article notes to give the LLM a persona in the prompt (i.e. act as if you are “x”). Harvard advocates for instructing the LLM on how you want your output to be presented, and to consider both tone and audience when generating prompts. Last, Wharton emphasizes the nature of interactive with prompts. Once you receive your first output, you can ask the LLM to amend a given portion of the response and/or add additional considerations to the response.
I don’t pretend to be a prompting expert, but here’s what’s worked for me:
Iterate: much like any first draft, I know my first prompt will be bad. I usually have to spend time providing context, tweaking, and making my instructions as clear as possible.
Show, don’t tell: don’t chat with a chatbot like you do with a friend - provide examples and details that focus the prompt on the intended outcome.
Ask for chain of thought: if I’m asking a more complex question, I want to understand how the LLM arrived at its outcome. I’ll often ask it to explain it’s reasoning step by step so I can refine my original prompt if I see a misinterpretation.
Spend time in the system and save prompt approaches: one prompt may work in one situation but not in others; pay attention to stylistic approaches, and keep track of successful conversations.
I’ll end on an open question that the researchers in “Why Johnny Can’t Prompt” raised to conclude their paper. “How can tools appropriately set capability expectations for end users?” As new, more advanced versions of LLMs come out and we interact with them, we as end users will need much more guidance on their abilities in order to understand their full potential and better interact with them.