How to Integrate AI to Unity: A Step-by-Step Guide for 2025

Last Updated on December 26, 2025

Unity AI Integration Concept

Note: This article covers advanced integration techniques for Unity 6 and 2022 LTS.

The era of scripted state machines is fading. We are moving from “Artificial Intelligence” that is merely a series of if-else statements to actual intelligence—neural networks that run in real-time within your game loop.

If you are a Unity developer asking “How do I put AI in my game?”, the answer is no longer just “A* Pathfinding.” Today, it means embedding Large Language Models (LLMs) for dynamic dialogue, using Reinforcement Learning for complex physics, and leveraging on-device inference engines.

This guide covers the three pillars of modern Unity AI integration:
1. The Brain: Local LLMs with LLMUnity.
2. The Body: Physics and movement with ML-Agents and Sentis.
3. The Workflow: Coding faster with Cursor AI.

Contents

1. The Brain: Integrating Local LLMs with LLMUnity

Cloud-based APIs like OpenAI are great, but they introduce latency, costs, and require an internet connection. For games, Local LLMs are the gold standard. They run entirely on the player’s device, ensuring privacy and zero latency.

The best tool for this in 2025 is LLMUnity, a wrapper around llama.cpp that allows you to run quantized models (like Llama-3 or Mistral) directly inside Unity.

Step-by-Step Implementation

Install the Package:
- Open Unity Package Manager.
- Click + > “Add package from git URL”.
- Enter: https://github.com/undreamai/LLMUnity.git
Setup the Scene:
- Create an empty GameObject named AI_Manager.
- Add the LLM component to it.
- Click “Download Model” in the inspector to fetch a quantized model (e.g., Mistral-7B-Instruct-v0.2.Q4_K_M.gguf). This model is optimized for performance (~4GB RAM).
Create the Character:
- Create your NPC GameObject.
- Add the LLMCharacter script.
- Link the AI_Manager to the LLM field.
- Prompt Engineering: In the “Prompt” field, define the persona:
  > “You are Eldric, a grumpy blacksmith who hates magic. Speak in old English.”
The C# Script:
Here is how to interact with it via code:

using UnityEngine;
using LLMUnity;

public class NPCInteraction : MonoBehaviour
{
    public LLMCharacter eldricAI;

    public void AskBlacksmith(string playerQuestion)
    {
        // "HandleReply" is the callback function called when the AI generates text
        _ = eldricAI.Chat(playerQuestion, HandleReply, ReplyCompleted);
    }

    void HandleReply(string response)
    {
        Debug.Log($"Eldric says: {response}");
        // Hook this into your UI text box here
    }

    void ReplyCompleted()
    {
        Debug.Log("Eldric has finished speaking.");
    }
}

Performance Note: On mobile devices, use smaller “Tiny” models (1B-2B parameters) to prevent overheating and ensure a steady frame rate.

2. The Body: Physics & Behavior with ML-Agents and Sentis

While LLMs handle text, ML-Agents handles movement and decision-making. Instead of programming how an enemy walks, you train it to walk.

The Pipeline: Train Python -> Run Unity

Training (The Gym):
You use the ML-Agents Toolkit (Python based) to train your agents in a headless version of your game. You reward them for correct actions (e.g., +1 for hitting the player) and punish them for failure.
Inference (The Game):
Once trained, you export the “brain” as an .onnx file.
Unity Sentis:
This is Unity’s neural network inference engine (formerly Barracuda). It takes that .onnx file and runs it on the player’s GPU or NPU (Neural Processing Unit).

Quick Setup for Inference

Install Sentis from the Package Manager (Registry).
Import your .onnx model into the Assets folder.
Use the Worker to run the model:

using UnityEngine;
using Unity.Sentis;

public class NeuralEnemy : MonoBehaviour
{
    public ModelAsset modelAsset;
    private IWorker worker;

    void Start()
    {
        Model model = ModelLoader.Load(modelAsset);
        worker = WorkerFactory.CreateWorker(BackendType.GPUCompute, model);
    }

    void Update()
    {
        // Convert game state (position, health) to Tensor
        using TensorFloat inputTensor = new TensorFloat(new TensorShape(1, 4), new[] { transform.position.x, transform.position.y, health, playerDist });

        // Run the neural network
        worker.Execute(inputTensor);

        // Get output (e.g., move direction)
        TensorFloat outputTensor = worker.PeekOutput() as TensorFloat;
        outputTensor.MakeReadable();
        float moveX = outputTensor[0];

        // Apply movement
        transform.Translate(new Vector3(moveX, 0, 0) * Time.deltaTime);
    }

    void OnDisable()
    {
        worker.Dispose();
    }
}

3. The Workflow: Coding with Cursor AI

To integrate these complex systems, you need a powerful IDE. Cursor AI (a VS Code fork) is rapidly becoming the favorite for Unity developers because it can “read” your entire codebase.

How to Connect Cursor to Unity

Since Cursor isn’t officially in the Unity preferences dropdown by default yet, follow these steps:

Install the Cursor Integration Package:
Use the community package com.boxqkrtm.ide.cursor to generate the correct .csproj files.
Set External Editor:
In Unity, go to Edit > Preferences > External Tools.
- External Script Editor: Select “Browse…” and find the Cursor executable (e.g., AppData\Local\Programs\cursor\Cursor.exe on Windows).
- External Script Editor Args: $(File).
The .cursorrules File:
Create a file named .cursorrules in the root of your Unity project. Paste this context so Cursor understands Unity’s quirks:

You are an expert Unity C# developer.
- Always use `SerializeField` instead of public variables for Inspector exposure.
- Use `TryGetComponent` instead of `GetComponent` for performance.
- Avoid `FindObjectOfType` in Update loops.
- Use `Mathf.Approximately` for float comparisons.

Performance Optimization for AI

Running Neural Networks costs frame time. Here is how to keep your game running at 60 FPS:

Asynchronous Inference: Never run worker.Execute() or llm.Chat() on the main thread without async/await patterns. LLMUnity handles this by default, but for Sentis, use Coroutines or C# Jobs.
Quantization: Always use quantized models (Q4_K_M or Q8). A 16-bit float model is twice as heavy as an 8-bit integer model with negligible accuracy loss.
NPU Utilization: Unity Sentis 2023+ can target the Neural Processing Unit on newer chips (Apple Silicon, Snapdragon), freeing up the GPU for rendering.

The Future: Runtime Generation

We are transitioning from static assets to runtime generation. In the near future, you won’t just integrate an LLM for chat; you will use it to generate quests, spawn enemies, and even texture 3D models on the fly using tools like Unity Muse.

By mastering LLMUnity and Sentis today, you are future-proofing your skills for the next decade of game development.