
Note: This article covers advanced integration techniques for Unity 6 and 2022 LTS.
The era of scripted state machines is fading. We are moving from “Artificial Intelligence” that is merely a series of if-else statements to actual intelligence—neural networks that run in real-time within your game loop.
If you are a Unity developer asking “How do I put AI in my game?”, the answer is no longer just “A* Pathfinding.” Today, it means embedding Large Language Models (LLMs) for dynamic dialogue, using Reinforcement Learning for complex physics, and leveraging on-device inference engines.
This guide covers the three pillars of modern Unity AI integration:
1. The Brain: Local LLMs with LLMUnity.
2. The Body: Physics and movement with ML-Agents and Sentis.
3. The Workflow: Coding faster with Cursor AI.
1. The Brain: Integrating Local LLMs with LLMUnity
Cloud-based APIs like OpenAI are great, but they introduce latency, costs, and require an internet connection. For games, Local LLMs are the gold standard. They run entirely on the player’s device, ensuring privacy and zero latency.
The best tool for this in 2025 is LLMUnity, a wrapper around llama.cpp that allows you to run quantized models (like Llama-3 or Mistral) directly inside Unity.
Step-by-Step Implementation
-
Install the Package:
- Open Unity Package Manager.
- Click
+> “Add package from git URL”. - Enter:
https://github.com/undreamai/LLMUnity.git
-
Setup the Scene:
- Create an empty GameObject named
AI_Manager. - Add the
LLMcomponent to it. - Click “Download Model” in the inspector to fetch a quantized model (e.g.,
Mistral-7B-Instruct-v0.2.Q4_K_M.gguf). This model is optimized for performance (~4GB RAM).
- Create an empty GameObject named
-
Create the Character:
- Create your NPC GameObject.
- Add the
LLMCharacterscript. - Link the
AI_Managerto theLLMfield. - Prompt Engineering: In the “Prompt” field, define the persona:
> “You are Eldric, a grumpy blacksmith who hates magic. Speak in old English.”
-
The C# Script:
Here is how to interact with it via code:
using UnityEngine;
using LLMUnity;
public class NPCInteraction : MonoBehaviour
{
public LLMCharacter eldricAI;
public void AskBlacksmith(string playerQuestion)
{
// "HandleReply" is the callback function called when the AI generates text
_ = eldricAI.Chat(playerQuestion, HandleReply, ReplyCompleted);
}
void HandleReply(string response)
{
Debug.Log($"Eldric says: {response}");
// Hook this into your UI text box here
}
void ReplyCompleted()
{
Debug.Log("Eldric has finished speaking.");
}
}
Performance Note: On mobile devices, use smaller “Tiny” models (1B-2B parameters) to prevent overheating and ensure a steady frame rate.
2. The Body: Physics & Behavior with ML-Agents and Sentis
While LLMs handle text, ML-Agents handles movement and decision-making. Instead of programming how an enemy walks, you train it to walk.
The Pipeline: Train Python -> Run Unity
- Training (The Gym):
You use the ML-Agents Toolkit (Python based) to train your agents in a headless version of your game. You reward them for correct actions (e.g., +1 for hitting the player) and punish them for failure. - Inference (The Game):
Once trained, you export the “brain” as an.onnxfile. - Unity Sentis:
This is Unity’s neural network inference engine (formerly Barracuda). It takes that.onnxfile and runs it on the player’s GPU or NPU (Neural Processing Unit).
Quick Setup for Inference
- Install Sentis from the Package Manager (Registry).
- Import your
.onnxmodel into the Assets folder. - Use the
Workerto run the model:
using UnityEngine;
using Unity.Sentis;
public class NeuralEnemy : MonoBehaviour
{
public ModelAsset modelAsset;
private IWorker worker;
void Start()
{
Model model = ModelLoader.Load(modelAsset);
worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, model);
}
void Update()
{
// Convert game state (position, health) to Tensor
using TensorFloat inputTensor = new TensorFloat(new TensorShape(1, 4), new[] { transform.position.x, transform.position.y, health, playerDist });
// Run the neural network
worker.Execute(inputTensor);
// Get output (e.g., move direction)
TensorFloat outputTensor = worker.PeekOutput() as TensorFloat;
outputTensor.MakeReadable();
float moveX = outputTensor[0];
// Apply movement
transform.Translate(new Vector3(moveX, 0, 0) * Time.deltaTime);
}
void OnDisable()
{
worker.Dispose();
}
}
3. The Workflow: Coding with Cursor AI
To integrate these complex systems, you need a powerful IDE. Cursor AI (a VS Code fork) is rapidly becoming the favorite for Unity developers because it can “read” your entire codebase.
How to Connect Cursor to Unity
Since Cursor isn’t officially in the Unity preferences dropdown by default yet, follow these steps:
- Install the Cursor Integration Package:
Use the community packagecom.boxqkrtm.ide.cursorto generate the correct.csprojfiles. - Set External Editor:
In Unity, go toEdit > Preferences > External Tools.- External Script Editor: Select “Browse…” and find the Cursor executable (e.g.,
AppData\Local\Programs\cursor\Cursor.exeon Windows). - External Script Editor Args:
$(File).
- External Script Editor: Select “Browse…” and find the Cursor executable (e.g.,
- The
.cursorrulesFile:
Create a file named.cursorrulesin the root of your Unity project. Paste this context so Cursor understands Unity’s quirks:
You are an expert Unity C# developer.
- Always use `SerializeField` instead of public variables for Inspector exposure.
- Use `TryGetComponent` instead of `GetComponent` for performance.
- Avoid `FindObjectOfType` in Update loops.
- Use `Mathf.Approximately` for float comparisons.
Performance Optimization for AI
Running Neural Networks costs frame time. Here is how to keep your game running at 60 FPS:
- Asynchronous Inference: Never run
worker.Execute()orllm.Chat()on the main thread without async/await patterns. LLMUnity handles this by default, but for Sentis, use Coroutines or C# Jobs. - Quantization: Always use quantized models (Q4_K_M or Q8). A 16-bit float model is twice as heavy as an 8-bit integer model with negligible accuracy loss.
- NPU Utilization: Unity Sentis 2023+ can target the Neural Processing Unit on newer chips (Apple Silicon, Snapdragon), freeing up the GPU for rendering.
The Future: Runtime Generation
We are transitioning from static assets to runtime generation. In the near future, you won’t just integrate an LLM for chat; you will use it to generate quests, spawn enemies, and even texture 3D models on the fly using tools like Unity Muse.
By mastering LLMUnity and Sentis today, you are future-proofing your skills for the next decade of game development.
Leave a Reply