Using LLM GenAI with students
How this anti --> #AIinEducation crusader uses AI with students
I am unapologetically against LM GenAI as a service used by students in education. But I am going to try and not mount my soapbox for this post. I have used LLM GenAI in two ways that I have found useful.
My use of LM GenAI
Manage well known information using RAG (Retrieval-Augmented Generation):
Use cases:
I have used LLM GenAI to check my own writing. I asked Claude to summarize all of my blog posts. This is much better than asking Claude to summarize something I have not read because I would know immediately if it “hallucinated” and made something up that sounded like something I said.
At our school we meet once a week as teachers of 10th, 11th, 12th graders to discuss issues and events and at the end every meeting we talk about interventions for specific students. We take good notes. I used Google’s Notebook LM which was able to access all of our notes over the last three years that live in Google Docs. I asked Notebook LM to summarize three years of notes by student mentioned. Helpfully, Notebook LM includes the provenance of each summary with links back to the docs where the information was found.
The problems:
Sycophancy - Guess what, Claude thought I did a really good job. This is nothing like a human editor reading for clarity. Claude is programmed to produce patterns of words that have a high likelihood of making me feel good about my work.
Omission and bias - I write a lot about race and power but those issues were not present in the summary created by Claude. I did try it again with Gemini and did include issues of “Race and Autocracy”. All LM GenAI models go through “tuning” before they are released. The LM GenAI vendors do this because there are certain topics, race being one of them, that get these services in trouble. So, if you write about race both the foundational and well documented bias of these models against people of color and women and the tuning of these models to get them to output only what the vendor considers appropriate patterns of word means that their output is useless for summarizing critical writing about race.
Security? - I know that RAG has unique issues with security because you are giving it documents. For example, looking for patterns in documents about students with Notebook LM seemed reasonable to me because we had already given Google permission to secure the documents that Notebook LM was accessing. Am I wrong? Doing the summary with content that is already public also seemed reasonable. Am I wrong?
The benefit - The output demonstrated that some of the patterns that I was trying to give weight did appear in the summary and it did make me feel good about my writing. Like a gin and tonic on friday after a hard week. But not like a colleague taking the time to read and tell you what they think.
Augment my code to help me implement new libraries
The use case
I use CoPilot in Visual Studio code to help me implement new code libraries I do not know into code I know well. As a Computer Science teacher, I write code in Python and I also use Flask which is a Python library for making websites. I’ve been doing this for a while and understand the code pretty well. Flask needs a database structure on the backend. I used MongoDB for a long time but am in the process of switching to SQLite. I spent time trying to figure it out using docs but failed. I then asked CoPilot to refactor my code. If I had taken the time to do it myself I definitely would have learned how SQLite works. Asking CoPilot to do it means that I now know what SQLite looks like in my environment, serving my use cases and over time I will increase familiarity. Especially because I am working with students to learn how to use these technologies.
The problem
I don’t learn how to implement what I am trying to implement. However, the reality is that using CoPilot saved soooo much time and effort that I probably would not have done the project at all.
The benefit
Learning to use a code library that is new to me is really difficult because in general tech docs are really hard to use. If I wrote code as a profession I would have built a familiarity with tech documentation and my use case would have been much easier. But, as a hobbyist, CoPilot makes it possible for me to take big steps and review the implementation to gain familiarity if not fluency.
So what does this tell me about student use of LLM GenAI?
The thing that makes a student a student is the fact that they have had fewer experiences than teachers. Sometimes the context is fewer life experiences and sometimes the context is fewer experiences with a specific domain. For K12 it is usually both. Because the existence of LLM GenAI has so massively disrupted teaching students how to write code I chose to let go of my curriculum and begrudgingly accept that there was very little I could do about the facts that:
LLM GenAI was being used by students to complete their assignments.
Computer Science and Math have a sort of permission structure where students can say things like “I am not a math person” and it will be accepted as carrying some truth because teachers and other adults have the same understandings of themselves.
Education is primarily transactional. Right answers → grades and credits → college → job → life with a house, a car, and a family and the ability to go on vacations.
One of the most potent indictments of LM GenAI in general but specifically in Education is that it makes it very difficult if not impossible to move from novice to expert.
Watching students do vibe coding has been very informative. First, they are NOT learning to code. They are gaining familiarity with code. They can read through code and find where a collision happens in a car racing game or where a particular value is rendered in a particular page in website code. This is valuable but it is NOT learning to code. Also, this does not mean that no one needs to learn to code anymore. That is a conclusion that only people who don’t know how to code make. And also those who benefit from the market success of vibe coding tools.
I asked students to vibe code a simple game that they already knew. Many did Connect4. The first time that I did this exercise I was shocked at the magic, which of course is NOT magic. But students were not really even impressed. To them it was really just installing the game. The second activity was to invent a game that did not exist. Some students made really remarkable inventions. One student made a game kinda like Risk but with a dramatically simplified interface. There is real value in this if the purpose of the class was game design and what makes a good game etc.
Maybe teaching code is largely dead except for the few students who are, for whatever reason, drawn to doing it. For very old and largely craven reasons, these students are usually male. AND, those boys should totally learn to code if they want to. Do we no longer care that the tech industry only serves men, mostly white men? Starting around 2016 this was an explicit care of many. It seems to have been completely erased and even un-mentionable now.
Never use LM GenAI for brainstorming
A common recommendation from EdTech “thought leaders” is to use LLM GenAI for brainstorming. Which is not possible because LLM GenAI is both biased and intentionally limited via tuning. When you ask any student but particularly students of color to use LM GenAI to brainstorm so you are leaning into white supremacy and asking students to see the world of possibilities through a white supremecist lens. Additionally, likely because of transactional nature of education, when you ask students about the future they tell you the proscribed story about going to college and getting a job or they just say I’m going to have money or they say “I don’t know”. When we ask students to use machines to manage their imagining, we are burying them even deeper into someone else’s story of their life. If you want students to choose from a limited set, then provide that set. Otherwise, teach them to deal with the blank page. And, yeah, it’s really hard.
LM GenAI should NEVER be the first step
For many of the same reasons as above, an assignment should never start with “Make X using LLM GenAI”. EdTech “thought leaders” will tell you that critiquing LLM GenAI output is a good exercise. It is not unless you are explicitly doing a unit on LLM GenAI Awareness. From what I can tell talking to students is the most common usage is to ask the LLM GenAI to explain the problem to them. On its face, this is a good thing. In reality I see evidence by students of all types that they are also using it to get the answer. They justify this with, “I understand what it wrote.” and all the studies show that they understanding of “their” content created by LLM GenAI is far more shallow than their understanding of content they created themselves.
It is a 100% reasonable choice to not use LM GenAI
Yes this means banning it. EdTech “thought leaders” hate this. EdTech “thought leaders” will tell you that “AI is here to stay” as if that means something more interesting than “drugs are here to stay”, “racism is here to stay”, “teen pregnancy is here to stay”, “poverty is here to stay”, “violence is here to stay”... The best thing that we can do for students is to teach them to think WITHOUT LLM GenAI. If, as they say, it is here to stay, they can become addicted later and hopefully use their actual intelligence to fight back its negative impacts. Getting students to not use LLM GenAI at all for the class is a much bigger conversation but there are two primary issues: 1) creating assignments that are resistant to LLM GenAI use, 2) creating an environment where students are incentivized for learning and growth OVER regurgitation and right answers.
LM GenAI should only be used by students to interact with work they have already done WITHOUT LM GenAI
As with the examples that I gave above for my own use. LLM GenAI should only ever be used with content the student created and has deep familiarity with. Only then can students have the ownership over the content to care about what LLM GenAI has to say about it as well as the familiarity with the content to understand when the output is biased, sycophantic and just making stuff up.
Using LM GenAI to help students deal with code is much better than using it with text and ideas
Using it with code is much better, I think, than using it with words and ideas. Code is testable in ways that words are not. The sycophancy of these models make them difficult to challenge. Code can’t be complimentary. It either does the thing or it doesn’t. And, if the goal is for students to learn to code, and it seems like that is a goal we don’t care about anymore, but if the goal is to learn to code then LLM GenAI should only be used to augment students’ existing code. It is only useful if the student has an active inquiry that the prompt is expressing. Using it with text and ideas, in my not so humble opinion, is only justifiable in the context of a unit on LM GenAI awareness. OR, must be accompanied by many warnings.



This was very informative and helpful. Of course I agree, but not holding my breath for thought leaders to think or lead. Don’t give up tilting at windmills!