.GGUF File Extension

GGML Universal Format File

Developer	ggml-org
Popularity	4.0 \| 2 Votes

What is a GGUF file?

A GGUF file is a machine learning model file that stores large language models (LLMs) and other AI models using the GGML Universal Format (GGUF). It packages the information needed to run a model into a single file, which may include model weights, tokenizer data, configuration information, and metadata. The GGUF format was introduced for the llama.cpp ecosystem and is commonly distributed through model repositories such as Hugging Face.

More Information

GGUF files are designed to make AI models easier to distribute and run locally. The format is widely used with tools that perform model inference, including llama.cpp and software built on that technology. Developers, researchers, hobbyists, and users running local AI assistants may work with GGUF files to deploy models on desktops, laptops, and other devices.

Many GGUF files contain quantized model data, reducing memory usage and enabling models to run more efficiently on consumer hardware while maintaining a practical balance between model quality and hardware requirements. Unlike formats that separate model weights, tokenizer files, and configuration files into multiple components, GGUF stores these resources together in a single container. The format also supports metadata and different methods of storing model data, allowing software to read model information and load model data efficiently.

NOTE: GGUF replaced earlier GGML-related formats and has become a common distribution format for local AI inference workflows.

How to open a GGUF file

You can open and use a GGUF file with software that supports the format, such as llama.cpp and LM Studio. Many users download GGUF models from Hugging Face and load them into local AI tools for inference.

For example, to open a GGUF file with llama.cpp, you run the llama.cpp command-line tool and pass the model file using the -m option, for example ./llama-cli -m model.gguf -p "your prompt". The program loads the GGUF model at runtime and uses it to generate responses based on the input prompt rather than opening the file directly like a document.

Programs that open or reference GGUF files

All Platforms

llama.cpp

Free

LM Studio