Training my own 70 million parameter Chatbot (like ChatGPT)

With the release of the Dolly-15k dataset from Databricks and the Pythia-70M model from EleutherAI, using the Hugging Face Transfomer library it’s easy to train your own ChatGPT like chatbot!

I trained several chatbots on different sized models (from 70 million parameters up to 1.4 billion parameters). At 70 million parameters the answers were sometimes properly structured, but heavily hallucinated. As I worked my way up to 1.4 billion parameters, I started to get legitimate answers!

Loss based on model size, lower means better. Reducing loss at a lower values is more impactful, for example, going from 1.5 to 1.4 tends to be much more of an improvement than 3.0 to 2.9.

How did the models do? I asked each one 5 questions:

Question 1: What is the capital of Peru?
Pythia-70m: The capital of Peru is the capital of Peru.

Pythia-410m: The capital of Peru is Lima.

Pythia-1b: The capital of Peru is Lima.

Pythia-1.4b-LoRA (8-bit): Lima is the capital of Peru and is the largest city in the country.

Question 2: What is the boiling point of water?
Pythia-70m: The boiling point of water is about 20.5 degrees Celsius.

Pythia-410m: The boiling point of water is the temperature at which boiling water will vaporize and become steam.

Pythia-1b: The boiling point of water is 212 degrees Fahrenheit.

Pythia-1.4b-LoRA (8-bit): The boiling point of water is 212.4 °F (100 °C).

Question 3: Explain the RGB color space.
Pythia-70m: RGB is a color space that is used to represent the light in a wide range of colors. This space is also used to represent the light in a wide range administrative area.

Pythia-410m: The RGB color space is a color space that represents a color as a single number. It is a color space that is used in the image processing field to represent a color as a single number.

Pythia-1b: RGB stands for Red, Green, Blue. The RGB color space is a color space that describes the way that light is perceived by humans. It is a three-dimensional space that is comprised of a red, green, and blue color.

Pythia-1.4b-LoRA (8-bit): The RGB color space is a color representation of the red, green, and blue components of light. It is a common color space used in color science and computer graphics.

Question 4: Share an inspirational quote.
Pythia-70m: “I am so honored to be the first person to write a book that I could write for my family and friends.” – John D. Roc

Pythia-410m: “I don’t believe in coincidences, I believe in the ability to predict the future.”

Pythia-1b: “If you want to be happy, be happy. If you want to be healthy, be healthy. If you want to be successful, be successful.”

Pythia-1.4b-LoRA (8-bit): “The world is a book and those who do not travel read only one page”.

Question 5: What is the best flavor of Starburst?
Pythia-70m: The best flavor of Starburst is the flavor of the sun. It is very sweet and very light.

Pythia-410m: The best flavor of Starburst is the one with the most carbon dioxide. It is the most bitter. The other flavors are not as bitter. The other flavors are the most sweet.

Pythia-1b: The best flavor of Starburst is a combination of chocolate and vanilla.

Pythia-1.4b-LoRA (8-bit): The answer is subjective. Some people enjoy the flavor of Starburst, while others do not. Some people enjoy the flavor of Starburst with a hint of cinnamon and nutmeg, while others do not.

For even the smallest model answers are returned properly structured (for the quote, I did not add the quotation marks, the model returned with them). The models clearly hallucinate less as the number of parameters grows.

This is all with fine-tuning on just 15k examples. “Snarky” responses such as “The capital of Peru is the capital of Peru” (which is technically correct) can be eliminated with more training and feedback. The quality of answers can also increase on the same-sized model with more feedback and training.

The 70 million parameters model can comfortably run on the CPU, the 1.4b parameter model can run on an 8GB GPU.

Image Prompt Generator

Check out my Hugging Face space for the Image Prompt Generator (AKA Image Fungineer).

Get amazing image generations from simple 1-3 word prompts.

Models like Midjourney, Stable Diffusion and Dalle 2 are capable of generating beautiful images, but require detailed prompts that can be time consuming to imagine and write. The fungineer takes just a few words and prompt engineers them into something far more imaginative. The following examples show a simple prompt on the left and a re-imagined prompt on the right:

“Marble Statue”
A mesmerizing image depicting a majestic marble statue suspended in the air, surrounded by a dazzling display of swirling, sunlit gasps of air. The statue, suspended in mid-air, is adorned with intricate, intricate designs, from leaves to veins, glowing with vibrant colors and contrasting patterns. The surrounding area is bathed in an ethereal, ethereal light, casting dappled shadows and highlighting the intricate details of the sculpture. wide aperture, standard lens, medium depth of field
basketball
An imaginative image of a basketball player leaping into the air as they strike the ball. The player, dressed in an off-the-shoulder outfit, is adorned with a dazzling array of colorful and metallic studs, displaying their incredible speed and power. Their body is draped in a dazzling array of high-tech equipment, including a power rack, a motion-sensitive visor, and a ball gown. The background features a breathtaking cityscape filled with people in colorful attire, such as colorful sneakers, and a dramatic cityscape filled with skyscrapers. wide-angle lens, wide-angle lens, high contrast

More examples can be found on the Hugging Face Space.

There are two models, one is 355 million parameters and another is 7 billion parameters. Both were trained on a dataset that used ChatGPT to convert boring image prompts in to exciting ones.

I found that the 355M parameter model tended to create some incoherence in the prompts, but still coherent enough to make beautiful images. The incoherence of the prompts often let to far more imaginative and interesting results.

The 7B parameter model consistently produced well-worded prompts, but that resulted in less interesting images from the image generators. Despite being larger, my preference for final images actually came from the smaller model.

I tested many different pre-trained base models along with LoRA and Prefix Tuning, with increasing number of parameters, I was able to minimize loss, but the cost and time for inference was much higher for only slightly better results. Here are the test results:

Running Llama-7B on Windows CPU or GPU

This post is being written during a time of quick change, so chances are it’ll be out of date within a matter of days; for now, if you’re looking to run Llama 7B on Windows, here are some quick steps.

Code Repo: https://github.com/treadon/llama-7b-example

Start by running PowerShell. Create a new directory and enter it.

mkdir llama
cd llama

I am assuming you have Python and PIP already installed, if not you can find steps on ChatGPT.

Next you need to create a Python virtual environment, you can do this without a virtual environment, but as of now it requires using nightly builds of Pytorch (for flash attention) and an unmerged branch of transformers.

python -m venv .venv
.\.venv\Scripts\Activate.ps1

This should create and activate a virtual Python environment. Next we’re going to install everything you need:

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
git+https://github.com/huggingface/transformers
pip install sentencepiece

This will take a few moments.

Now create a file called llama.py with the following body:

import transformers

device = "cpu"

tokenizer = transformers.LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
model = transformers.LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf").to(device)

batch = tokenizer(
    "The capital of Canada is",
    return_tensors="pt", 
    add_special_tokens=False
)

batch = {k: v.to(device) for k, v in batch.items()}
generated = model.generate(batch["input_ids"], max_length=100)
print(tokenizer.decode(generated[0]))

That’s all there is to it! Use the command “python llama.py” to run it, you should be told the capital of Canada! You can modify the above code as you desire to get the most out of Llama!

You can replace “cpu” with “cuda” to use your GPU.