AI Literacy: An Introduction

Introduction

As AI becomes more and more relevant, I realized creating resources on AI Literacy would be important not only for my learning but also for my campus. I believe libraries can and should take the lead in this endeavor. It seems natural as libraries already exist as pillars of information in their communities, and Librarians are masters of information literacy. This blog post is adapted from a YouTube video I created for my Library, which can be found here.  

Learning Objectives

  • Remember the difference between the terms AI and Generative AI.
  • Understand the importance of AI literacy.
  • Apply learnings to your use of AI by being mindful of the environmental impacts and knowing how to use AI ethically.
  • Analyze the strengths and weaknesses of AI use in different scenarios.
  • Evaluate Generative AI outputs for accuracy, reliability, and identifying biases
  • Create a plan for how you will keep yourself educated on AI topics and help keep those around you informed about AI literacy.

So, What is AI?

At its core, the term “AI” stands for Artificial Intelligence, which can loosely be defined as computers designed to mimic human intelligence to some degree. When you hear the term tossed around on social media, people are typically referring to Generative AI, a subset of AI that’s particularly trained to produce some form of output, whether that be text, audio, or imagery. However, other types of AI exist and have existed for a long time.

The oldest type of AI is referred to as “Reactive Machine” AI. These systems are programmed for specific scenarios and are great at reacting to them, but they cannot perform anything beyond those duties and do not learn from their previous interactions. An example of a Reactive Machine is a video game bot, such as IBM’s Deep Blue, which famously beat an international chess grandmaster. 

The other category of AI that exists currently is referred to as “Limited Memory.” Limited Memory AI is more complex than reactive machine AI, as it is coded to imitate the human brain, thus meaning it can draw from past experiences to help inform current situations. Despite the fact that it can draw from past experiences, however, that data is not actively stored in the machine’s content library. Instead, it can only improve over time with new training data. Generative AI falls into this category of AI. Another example of Limited Memory AI is self-driving cars, as they are trained on a lot of old data as a starting point while also taking in cues from the environment they exist in. As Neil Sahota states, “every time a self-driving car is started, the learning process begins again,” meaning that while the AI does learn from each interaction, it is only within the confines of that interaction unless otherwise programmed to retain that interaction’s information, thus “Limited Memory” as it’s name states.

There are two other stages of AI that researchers have theorized about, but these do not yet exist. “Theory of Mind” AI would, as Coursera puts it, “have the potential to understand the world and how other entities have thoughts and emotions,” thus affecting its behavior in relation to those interacting with it. The other theoretical type of AI, “Self Awareness,” would not only understand the world around it but also be aware of itself.

Now that you have a basic understanding of the categories of AI, it might be a little easier to see how AI has been around us for quite a while now, even though AI has only recently become a buzzword. The autocorrect on smartphones, google autocomplete suggestions, text-to-speech, video transcript tools, and YouTube video recommendations are all examples of AI that you may encounter and even interact with on a daily basis. 

Some tools have always had an AI component but only recently shifted their marketing strategies to highlight their AI functionalities, like Grammarly. Grammarly has blog posts on their website dating back to the summer of 2018 titled “Under the Hood at Grammarly: Detecting Disorganized Writing With AI,” but they only really started properly embracing AI in marketing in early 2024, as evident from their blog section on AI (Only four articles are pre-2024, with over 50 since March 2024.)

Many other companies are rushing to integrate AI into their platforms as quickly as they can to keep up with the trend, evident from social media platforms like Instagram, Facebook, and X (Formerly Twitter) and other companies like Google and Microsoft adding AI chat assistants into the everyday functionality of their products. You may have noticed a change in Google searching within the past year where the top results are now an AI-generated answer to your question, drawing from the results google pulls up. We’ll talk about this later.

Generative AI

Now that you have a basic understanding of the concept of AI and some examples of where you may have seen it before, we can dive a little deeper into other topics like generative AI, its uses and limitations, environmental impact, and more. 

So, what is Generative AI? Generative AI can loosely be defined as an AI that generates outputs based on user inputs. How do they generate those outputs? They’re trained on large amounts of data. Some Generative AI specialize in text-based outputs, while others focus on images, audio, or video. Some examples of popular Generative AI are ChatGPT, Gemini (formerly Bard), Microsoft Copilot, Meta AI (integrated into platforms like Instagram, Facebook, and WhatsApp), Claude, Dall-e, Sora, and Midjourney. These interfaces allow for a wide range of different output types.

Before we move on to explaining the step-by-step process of how generative AI works, let’s look at some key terms.

Important terms to know

General Terms

Machine Learning – Machine learning is the process by which the AI analyzes data, allowing it to learn to find patterns and make predictions
Neural Networks – The way an AI is structured that mimics the human brain, which helps it recognize patterns, this is a machine learning model!
Deep Learning – Deep learning is another step further in as it uses the neural network structure to learn from data at a more complex level.
Large Language Model – A Type of AI that can understand and produce human writing. Not all Generative AI are LLMs but all LLMs are generative AI.
Natural Language Processing – The ability of a computer to understand human language, this is what allows the generative AI to interpret the words you input and respond to them by breaking down your input into tokens.

Learning Types for Training the AI

Supervised Learning – Training using labeled data (questions with the answers provided)
Semi-Supervised Learning – Training using labeled and unlabeled data, the labeled data is used as a starting point and then the AI has to predict answers for the unlabeled data
Unsupervised Learning – Training using unlabeled data, considered unreliable
Reinforcement Learning – Training that teaches the AI to make decisions that give the best results by rewarding the answers the trainer wants to see

User Input Terminology

Interface – The place you go to interact with a Generative AI. (i.e. Chat GPT, Copilot, etc.)
Prompts – The inputs you give the AI to obtain a specific result. The way you structure your prompt can influence the output you receive. Similar to how you may need to reword a google search to get the answer you’re looking for.

How does Gen AI work?

As a baseline, let’s look deeper into how text-based Generative AI works. At the simplest level, you open the interface, type in a prompt, and the AI generates a response based on your prompt. The inner mechanics of the AI can get quite complicated to understand, so let’s break it down a little bit. 

When you hit send on a prompt, the AI breaks each word of the prompt into tokens as part of the natural language processing. Each word is broken into one or two tokens that the AI can process rather than by each letter. There was a great example that demonstrates how tokens work during mid-2024. People noticed that upon asking ChatGPT “How many of the letter R are in the word strawberry, it would confidently reply with “2” when, of course, the correct answer would be “3”. The AI would continue to respond with “2” unless asked to spell out the word then recount. Unless prompted, it doesn’t fully process the complexities of the words you’re typing, cannot understand them on a letter-by-letter level, and is essentially just guessing what it thinks the answer is. 

Once the AI has broken the prompt into tokens, it analyzes the value and order of each token to determine what you’re asking of it. Large Language Models like ChatGPT are trained on large amounts of data and go through lots of human testing (using the types of learning defined earlier) before being released to the public to help refine this phase as much as possible.

After it’s analyzed and determined what the prompt says, it generates an output. The output will be based on the data it was trained on, and may come across as if the AI quote-unquote knows the answer to your question, but this is not the case. The AI, from the input you provided, is simply predicting what words would come next in the sequence it spits out, based on it’s training. For example, ChatGPT is quite proficient at producing the correct formatting for APA citations, as it has been trained on data that involves the correct citation format. However, when asked to provide a citation for a journal article on a specific topic, it will simply generate what a citation in that field could look like, not guaranteeing the article actually exists. Some AI, like newer versions of ChatGPT, can access the internet, giving it a higher likelihood of producing accurate information, though it is still not always going to provide 100% real citations. We’ll cover an example of this later on.

The main takeaway from breaking down the steps of how generative AIs work is to emphasize that while generative AI seems to know the answer to everything, it’s just predicting what it thinks the correct response should be to any given input based on its training. To avoid attributing the concept of knowledge and understanding to generative AI, using the terminology “interpret” instead of “think,” “prompt” instead of “question,” and “output” instead of “answer” can be helpful.

Downsides to Generative AI

Now that we’ve covered the general training process and functionality of Generative AI, we need to be mindful of the potential downsides to Generative AI: It doesn’t know everything (sometimes it even makes things up!), where does the training data come from, and the environmental impact.

It Doesn’t Know Everything

While explaining how Generative AI works, I emphasized that the machine doesn’t magically “know” the answer to everything. It’s programmed to look at your input and produce its output based on the data it’s trained on and how the human testers influenced its responses in the training process. While yes, the computer will often produce accurate and useful information, this will not always be the case. As the AI is outputting what it interprets to be the best response, there is always a chance that that “best response” could be non-factual information. This is referred to as a “hallucination,” where the AI produces information that looks normal but is not true, whether about something that doesn’t exist or something that never happened. There have already been real-world consequences as a result of AI hallucinations. In 2023, An editor was doing research for a publication using Chat GPT and received a false fact about a radio host, accusing the radio host of embezzling funds. This ultimately led the radio host to take OpenAI, the company behind ChatGPT, to court.  

Earlier on I mentioned how Google has introduced an AI summary feature into the core functionality of it’s search interface. Now, when you google something, the first thing you’ll see is an AI summary of what the platform interprets to be the most relevant sources to answer your input. While this can be useful, saving you the time of having to scroll through several articles looking for your answer, it is not guaranteed to be correct, as is the case with any Generative AI, even though it is directly referencing the internet. When the feature first launched, these errors were a lot more noticeable. When a user googled “cheese not sticking to pizza” they were met with an AI summary that instructed them to add ⅛ cup of non toxic glue to the sauce. Over time, the AI overview feature has improved, but there is still always room for questionable references to be used in the summary. This is also related to the bias of training data. Google overview took what it interpreted to be a good source to answer the prompt, but in reality, it sourced from a joke post on Reddit. While that was an extreme case, bias is still very real and can present itself in a variety of ways depending on what interface you’re using. 

Bias can come from how you type your prompt through confirmation bias, where the AI simply reinforces your previous ideas or beliefs about a specific topic. It can also come from within the training itself, if it’s only being given data that supports a particular viewpoint or if the human trainers train it to respond negatively to specific cues. For example, studies have shown AI to be biased both through race and gender, typically from stereotypes that are often reinforced by media, likely from the data they’re trained on. When interacting with any Generative AI, you need to be aware of the biases that may be present implicitly and from the AI itself.

Where Does the Training Data Come From

That brings us to the next point: where does the training data come from? This, of course, varies from company to company and will be dependent on the interface you use, and unfortunately, this information isn’t always public. In a perfect world, all the training data would be ethically sourced from consenting parties, and the sources of the information would be posted on the website. In reality, though, there’s enough examples of the opposite happening to make you question how all that data is being collected. 

One predominant quote-unquote problem is that they’re running out of training data. Many news sources have reported on a study done by research group Epoch AI, which claims that we could be out of human-written text for the chatbots to train on by as early as 2026. It takes a very large amount of information to train a large language model, and this has led to less care in where the information comes from. Much of the data used to train large language models like Chat GPT are simply scraped from publicly available internet sources. But the issue is that being publicly available doesn’t mean it is free to use. You could consider things like blog posts, newspaper articles, and YouTube videos “publicly available.” Still, someone technically owns that information and isn’t necessarily willing to allow their data to be used for AI training. In 2023, OpenAI and Microsoft were accused of using stolen personal data for training. In early 2024, the lawsuit was dismissed for being “excessive in length” and containing “distracting and unnecessary allegations.” Both companies denied the claims against them. In the summer of 2024, it was discovered that several large companies were using YouTube video subtitles as training data without the permission of the content creators. This violates YouTube’s terms of service, which prohibits accessing its videos by “automated means,” as a code was used to scrape the captions from the platform, but clearly with the rush for information, those who created the dataset do not care. In December 2024, YouTube announced they would be introducing a feature to allow creators to opt-in to sharing their video data with third-party companies for AI training. Rather than introducing more policies to help protect their platform’s creators, YouTube simply made it easier for other companies to access the platforms’ content. While this is unfortunate, this is not surprising, as during the summer of 2024, Google updated its privacy policy to state that they could use any publicly available data on their platform to train their AI systems. And Google isn’t alone in this shift. Companies like Meta, X (formerly Twitter), and LinkedIn have started training their AI on everything you post to the platform. Some of the platforms allow you to opt out of this training, but that isn’t always the case, so if this is something you care about, it might be worth looking into the next time you go to post on social media.

Environmental Impact

The last potential downside of Generative AI is the environmental impact. Most people don’t tend to think that technology use might have an impact on the environment, so the numbers might shock you. To start, let’s look at a Google search. One Google search uses approximately 0.5 milliliters of water in energy used to give the search results

Now, let’s look at one of the most popular Generative AI interfaces, ChatGPT. To generate one 100-word email using ChatGPT using the GPT-4 model, you’re using 519 milliliters, slightly more than one 16.9-ounce bottle of water. Similarly, for every series of 10-50 prompts you send (for example, a conversation about a topic with ChatGPT) you will use 500 milliliters of water. Many companies have been reporting an increase in water consumption over the last few years, and this consumption is only going to increase as AI is used by more and more people. In 2022, Microsoft (a funder of OpenAI) shared that one of its data centers in Iowa used 1.7 billion gallons of water, which was a 34% increase from the previous year, and Google, while training its Generative AI, Bard (now Gemini), used 5.6 billion gallons of water, a 20% increase from the previous year. And that was in 2022. In 2023, Microsoft’s use increased to over 2 billion gallons, and Google’s use increased 6.4 billion gallons. Both companies aim to reach goals of matching and even exceeding water replacement by 2030, but with the increase in water use, will this be achievable? Microsoft replaced 16 billion gallons of water since 2020, and claims that their AI workloads will not consume water. Google was only able to replace 1 billion gallons of fresh water in 2024. The main thing you can do as a potential user of AI is to be mindful of the environmental impact your use of AI might have when deciding if you will use it. 0.5 milliliters for a Google search versus up to 500 milliliters for a conversation with ChatGPT. 

How you might interact with Gen AI

Now that you’re aware of the potential pitfalls of the technology, hallucinations, potentially stolen data, and environmental impact, we can get into how you might use Generative AI, where you might avoid using it, how to fact-check it, and how to be aware of encounters with AI outside of your own use.

Where might you use Gen AI?

Where you might use Generative AI depends on a lot of things. If you’re a student, you might think about how you can use it to help plan out projects, get help picking a topic for a research paper, or even an outline for an essay. The first step, however, is making sure you check your syllabus for any AI-related policies. Different instructors will have different policies on Generative AI use in their courses, so make sure you aren’t breaking any rules before you decide to use it. If you’re a faculty member, you might use Generative AI to help you develop icebreakers for a new class, brainstorm ways to make your classes more interactive, or even help you plan a new workshop or lesson idea. If you’re a staff member, you might think of ways Generative AI can help you streamline various daily tasks you need to complete, such as planning events or coming up with marketing campaigns. Even outside of work, you might find uses for Generative AI, such as planning a new daily routine or making a schedule for cleaning your house.

Where might you NOT use Gen AI?

While there are a lot of opportunities for places you might think about incorporating Generative AI into your work and life, there are also considerations for where you might want to avoid using it. This part is extremely subjective and will be up to you to decide on. Let’s break this into two categories: creative works and academic works.

Creative Works

Starting with creative works, for example, some people are particularly against the use of Generative AI for creative endeavors, such as creative writing, video creation, and imagery. Within the past year or so, you may have seen Actors, Writers, and Video Game Actors going on strike to protest the use of AI in film and other media. The use of Generative AI in creative industries threatens to take people’s jobs, as studios want to cut costs by using AI to replace the work of real people. In addition to these strikes, artists have been worried about how Generative AI may impact their work. AI image generation has been a growing trend on social platforms, which is problematic for artists for two reasons: Just like with the text-based models, the image-generation models are trained off mostly stolen “publicly-available” work without artists’ permission, and people are less likely to commission artists, instead turning to AI to generate an image of their liking. Additionally, with social media platforms introducing policies about using all posts to train their AI, artists are having to think about different ways of promoting their work, making it more difficult for their businesses to thrive.

Academic Works

When thinking about academic works, you need to be mindful of plagiarism. You can technically cite the AI chatbot, some citation styles have already introduced citation methods for this, but where is the bot getting its information from? How does that differ from citing a research paper? Someone somewhere must have done the research to get the bot to give you the answer you’re looking for, right? This is why, in my previous examples, I avoided recommending outright asking the Generative AI for answers to questions, not only because of previous points about hallucinations but also because of the matter of citation. 

Using AI Ethically

A good rule of thumb to follow to use Generative AI ethically is to use it as a tool to help assist you with your work or other tasks, but avoid using it to create any form of final product. If you’re having trouble with an essay, don’t ask a chatbot to write it; instead, try asking for topic recommendations or an outline. If you’re having trouble finding materials for your topic, instead of asking a chatbot to give you some citations or articles to use, ask it what topics are related to the one you’re looking for or for some prominent authors or experts in that field you can then look for in a library database (or, even better, you could ask a librarian!). If you’re looking for the answer to a specific question, you’re probably better off googling it!

Being Aware of Gen AI

Fact-Checking Gen AI

However, of course, if you’re enthusiastic about using Generative AI and do decide to use it as a source of knowledge, be cautious of hallucinations. Remember, the information it provides will not always be 100% factual, and biases may be present. For example, ChatGPT has been known to create fake citations. The citations will look perfectly formatted and seem like legitimate sources, but even if the interface has access to the internet, there is still room for error. If you do decide to ask for citations for a paper or quotes that support your paper’s claim, do not use the output without finding the actual source of the information first.

 If you decide to ask a Generative AI for information on a particular topic, make sure you find a research article that backs up the claims in the output. No matter how genuine the information looks, unless you are an expert in that field and can fact-check it yourself, you should not assume that the information is correct. Another key thing to be aware of is that depending on which Generative AI interface you’re using, it may not have access to the internet and thus will not necessarily be up to date on recent information regarding the news. As the training process can be extensive and take some time, the data collection process often stops before the training begins, which is why there’s a chance it could be a year or so behind in its facts about “recent” information. For a past example, an article written in March 2023 had queried ChatGPT about an author’s most recent book publication, and ChatGPT told them a book title that had been published in April 2021, when in actuality, the correct answer would have been a book title published in November 2022. 

At this point in the video, I walk through fact-checking some citations provided by ChatGPT. If you’re interested, this link will take you there

Generative AI in the Wild

While we’ve talked about how you might use Generative AI, it’s also important to be mindful of how others might be using it too. The surge of Generative AI use has been rapidly excelled by the presence of social media platforms, particularly through trends about generating images or music. More concerningly, when thinking about misinformation, AI-generated videos have been getting a lot better. To the trained or even just alert eye, it’s usually not too difficult to tell when something was generated with AI. However, when you’re not knowledgeable about it or do not keep it in mind, sometimes things may slip under your radar without you knowing. When sharing images or news articles that seem particularly unbelievable, make sure you’re giving them a second look over before sharing. Fake news has been around for ages, but generative AI seems to be increasing the spread of misinformation online. You never know when there might be an AI-generated image of a fake animal or a clip of something that never happened. Beyond social media, you also need to be on the lookout for AI-generated content in academia. There’s a website called Retraction Watch that has a list of papers and peer reviews with evidence of ChatGPT writing. It’s incredible that not only did these researchers copy and paste from ChatGPT while accidentally leaving in telltale signs of their AI use, but that the editors and reviewers also missed these mistakes. This is unfortunate and leads one to wonder how many people are getting away with using ChatGPT and other Generative AI for publications without disclosing the use of AI in their publications. 

Here’s an example. The journal article, under the heading “Future Scope,” starts with “As an AI language model, I cannot predict the future, however, here are a few potential future scopes.” which makes it clear that the author of that paper copied and pasted directly from ChatGPT.

AI and Information Literacy

Lastly, let’s wrap up by discussing why learning about AI is important for information literacy. Having information literacy is having the ability to locate, evaluate, and use information effectively while understanding how the information was produced and knowing how to create information yourself. Generative AI impacts all areas of information literacy, from how we locate, evaluate, use, understand, and create information. AI literacy is simply a new portion of information literacy that will be important for everyone to learn as Generative AI becomes more and more prevalent in our day-to-day lives—knowing how it works, understanding its weaknesses, being able to use it, and recognizing when it’s being used. The goal of this article was to introduce you to the AI literacy learning process, but your learning won’t stop here. The technology is advancing very rapidly, and it would take hours to be able to talk about all the different aspects of AI, diving deeper into the training processes, how the algorithms work, how each unique Generative AI works, how it’s impacting every field of work, and more. There are a lot of videos, plenty of online courses, and many, many online resources that can help you continue your journey of learning about AI. 

If you have any questions, please feel free to contact me at crabtra@sunypoly.edu. Thank you for your time!

Leave a comment