My Notes on LLMs: Intro

5 min readSep 22, 2024

What impresses us most about AI isn’t just its cognitive and creative abilities, but the fluency and naturalness of its responses. It’s as if behind every server there’s a hidden human answering our questions with an all-knowing encyclopedia. If such a human existed, they’d be called “LLM”.

What is a Large Language Model (LLM)?

A Large Language Model (LLM) is a type of artificial intelligence (AI) program that can recognize and generate text, among other tasks. LLMs are trained on huge sets of data — hence the name “large.” LLMs are built on machine learning: specifically, a type of neural network called a transformer model.

To keep this article concise, I’ll provide a brief concept and skip detailed explanations of some AI-related concepts mentioned in the quote above.

These are:

Machine Learning: refers to a type of algorithm that can learn and improve from experience without explicit programming.
Transformer model: A type of neural network architecture that transforms or changes an input sequence into an output sequence. It achieves this by learning context and identifying relationships between sequence elements.
Datasets (huge sets of data): refers to data collections that are extremely large, complex, and fast-growing.

In short:

LLMs are limited by the datasets they are trained on and their capabilities as models. This explains why different LLMs don’t offer the same results and performance, even when they receive the same prompt.

LLMs vs. GPTs: Are they the same?

For this instance, I’d like to ask the LLM Mistral Large 2 about this. Its response is:

LLM (Large Language Model) and GPT (Generative Pre-trained Transformer) are related but distinct concepts in the field of natural language processing (NLP) and machine learning. Here’s how they differ:
LLM (Large Language Model):
LLM is a broad term that refers to any language model with a large number of parameters (typically in the range of billions) trained on vast amounts of text data.
The goal of LLMs is to understand and generate human language by learning patterns from the training data.
Examples of LLMs include models like BERT, RoBERTa, T5, and indeed, GPT models.
GPT (Generative Pre-trained Transformer):
GPT is a specific type of LLM developed by OpenAI. It’s based on the transformer architecture, which uses self-attention mechanisms to process and generate text.
GPT models are pre-trained on a diverse range of internet text using unsupervised learning, allowing them to generate human-like text and perform a wide range of language tasks with minimal fine-tuning.
GPT models are a subset of LLMs, but not all LLMs are GPT models.
In summary, while all GPT models are LLMs, not all LLMs are GPT models. GPT refers to a specific family of models developed by OpenAI, while LLM is a broader term that encompasses many different large-scale language models.

So, if you think you know a lot about AI just because you use ChatGPT, I’ve got some bad news, my friend: you’re in a tiny boat in the middle of a vast ocean. On the bright side, there’s still a wealth of fascinating knowledge waiting to be discovered by you! Let this Medium Series help you to sail.

Exploring the Universe of LLMs and AI Platforms

Okay! So, if ChatGPT or Gemini are just the shining stars of my Big Dipper, what’s the rest of the sky like?

The short answer is: if we consider any parameterization setup and/or instruction of a model as a differentiator, then there are thousands of models out there.

Given the vast number of LLMs, it’s impossible to mention them all. Instead, here are some popular platforms that provide access to leading language models. We’ll delve deeper into these in future articles:

ChatGPT by OpenAI
Gemini by Google
Claude by Anthropic
Le Chat Mistral by Mistral AI
Llama by Meta (you can chat using Llama models in Perplexity)

These are the most well-known platforms that use LLMs and offer a chatbot experience. Users can input instructions, and the platform efficiently fulfills requests with outstanding results.

Why explore other AI options when you already have a good one?

Well, it’s important to understand that each model has unique strengths and limitations, which can affect their performance on various tasks. This applies not only to their performance but also to the information they provide.

Let’s look at some examples to illustrate this point:

Platforms like Gemini and ChatGPT are multimodal, processing and generating outputs from various data types including text, images, audio, and video. In contrast, Claude and Mistral focus primarily on text generation and understanding. While Gemini and ChatGPT can manipulate images, Claude and Mistral are limited to describing them textually.
Claude excels in providing an intuitive interface for code comprehension and generation. While other platforms produce quality code, their output isn’t as readily usable as Claude’s.
Gemini offers a 2M token context window, while Claude provides 200K, and Mistral and ChatGPT offer 128K tokens each. The AI context window is like the AI’s short-term memory. It determines how much text the AI can work with at one time. A bigger context window means the AI can handle longer conversations or documents. This helps the AI stay on topic and give relevant responses, even in long interactions.

As you can see, each tool has its strengths. For long conversations or threads, Gemini might be your best bet. If you’re looking for quick, efficient coding and copy-pasting, Claude could be the way to go. Ultimately, the right tool depends on your specific needs and goals.

What’s next?

Given the context above, it’s clear there’s a wealth of knowledge to explore about AI and these specific platforms. That’s why I’ve decided to write “My Notes on LLMs”. I’ll be experimenting with these platforms for at least a week and sharing my experiences in a new article each weekend. Don’t worry — I’ll delve into technical details and deep AI concepts. This won’t be a mere diary of my prompts.

Finally, I intend to demonstrate how to use these platforms’ APIs, enabling you to create your own versions of them.

That being said, see you soon!

References

What is a Large Language model (LLM)? (n.d.). CloudFlare. https://www.cloudflare.com/learning/ai/what-is-large-language-model/
Kilpatrick, L., Mallick, S. B., & Kofman, R. (2024, June 27). Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 are available today. https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/
Ai, M. (n.d.). Technology. Mistral AI | Frontier AI in Your Hands. https://mistral.ai/technology/
Pricing. (n.d.). https://www.anthropic.com/pricing#anthropic-api
Models. (n.d.). OpenAI. https://platform.openai.com/docs/models

A gentle reminder:

This article series, “My Notes on LLMs”, is a collection of my personal insights, opinions, and reflections on Large Language Models, their platforms, and applications. It’s written by a human — me — who’s constantly learning and evolving. I respect differing viewpoints and preferences. If these articles help you better understand LLMs or awake your curiosity, they’ve fulfilled their purpose.

Thank you for reading!