icon

API base on ad22ace439eb3fab7230134e56bb6276a78347e4

Endpoints

Endpoint	Method	Description
`/api/generate`	POST	生成補全
`/api/chat`	POST	生成聊天補全
`/api/create`	POST	創建模型
`/api/tags`	GET	列出本地模型
`/api/show`	POST	顯示模型信息
`/api/copy`	POST	複製模型
`/api/delete`	DELETE	刪除模型
`/api/pull`	POST	拉取模型
`/api/push`	POST	推送模型
`/api/embed`	POST	生成嵌入
`/api/ps`	GET	列出運行中的模型
`/api/version`	GET	版本

約定

模型名稱

模型名稱遵循 model:tag 格式，其中 model 可以有一個可選的命名空間，例如 example/model。一些例子包括 orca-mini:3b-q4_1 和 llama3:70b。標籤是可選的，如果未提供，將默認為 latest。標籤用於識別特定版本。

持續時間

所有持續時間均以納秒為單位返回。

stream 流式響應

某些端點以 JSON 對象的形式流式傳輸響應。可以通過為這些端點提供 {"stream": false} 來禁用流式傳輸。

生成補全

POST /api/generate

生成給定提示的響應，使用提供的模型。這是一個流式傳輸端點，因此會有一系列響應。最終的響應對象將包括請求的統計數據和其他數據。

參數

model：（必需）模型名稱
prompt：生成響應的提示
suffix：模型響應後的文本
images：（可選）base64 編碼的圖像列表（適用於多模態模型，如 llava）

高級參數（可選）：

format：返回響應的格式。格式可以是 json 或 JSON schema
options：其他模型參數，列在 Modelfile 文檔中，例如 temperature
system：系統消息（覆蓋 Modelfile 中定義的內容）
template：使用的提示模板（覆蓋 Modelfile 中定義的內容）
stream：如果設置為 false，響應將作為單個響應對象返回，而不是一系列對象
raw：如果設置為 true，則不會對提示進行任何格式化。如果您在向 API 發送請求時指定了完整的模板提示，可以選擇使用 raw 參數
keep_alive：控制請求後模型保持加載在內存中的時間（默認為 5m）
context（已棄用）：從先前的 /generate 請求返回的上下文參數，可以用來保持短期對話記憶

結構化輸出

通過在 format 參數中提供 JSON schema 支援結構化輸出。模型將生成符合該 schema 的回應。請參見下方的結構化輸出範例。

JSON 模式

通過將 format 參數設置為 json 來啟用 JSON 模式。這將使回應結構化為有效的 JSON 物件。請參見下方的 JSON 模式範例。

[!重要] 在 prompt 中指示模型使用 JSON 非常重要。否則，模型可能會生成大量空白字符。

範例

生成請求（流式傳輸）

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "為什麼天空是藍色的？"
}'

回應

返回一系列 JSON 物件：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "因為",
  "done": false
}

流中的最終回應還包括有關生成的附加數據：

total_duration：生成回應所花費的時間
load_duration：加載模型所花費的時間（以納秒為單位）
prompt_eval_count：提示中的標記數量
prompt_eval_duration：評估提示所花費的時間（以納秒為單位）
eval_count：回應中的標記數量
eval_duration：生成回應所花費的時間（以納秒為單位）
context：用於此回應的對話編碼，可以在下一個請求中發送以保持對話記憶
response：如果回應是流式傳輸的，則為空；如果不是流式傳輸，則包含完整的回應

要計算回應生成的速度（以標記/秒為單位），可以使用 eval_count / eval_duration * 10^9。

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 10706818083,
  "load_duration": 6338219291,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 130079000,
  "eval_count": 259,
  "eval_duration": 4232710000
}

請求（無流式傳輸）

請求

當流式傳輸關閉時，可以在一個回應中接收回應。

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "為什麼天空是藍色的？",
  "stream": false
}'

回應

如果 stream 設置為 false，回應將是一個 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "天空是藍色的，因為它是天空的顏色。",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 5043500667,
  "load_duration": 5025959,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 325953000,
  "eval_count": 290,
  "eval_duration": 4709213000
}

請求（帶有後綴）

請求

curl http://localhost:11434/api/generate -d '{
  "model": "codellama:code",
  "prompt": "def compute_gcd(a, b):",
  "suffix": "    return result",
  "options": {
    "temperature": 0
  },
  "stream": false
}'

回應

{
  "model": "codellama:code",
  "created_at": "2024-07-22T20:47:51.147561Z",
  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",
  "done": true,
  "done_reason": "stop",
  "context": [...],
  "total_duration": 1162761250,
  "load_duration": 6683708,
  "prompt_eval_count": 17,
  "prompt_eval_duration": 201222000,
  "eval_count": 63,
  "eval_duration": 953997000
}

請求（結構化輸出）

請求

curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
  "model": "llama3.1:8b",
  "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "age": {
        "type": "integer"
      },
      "available": {
        "type": "boolean"
      }
    },
    "required": [
      "age",
      "available"
    ]
  }
}'

回應

{
  "model": "llama3.1:8b",
  "created_at": "2024-12-06T00:48:09.983619Z",
  "response": "{\n  \"age\": 22,\n  \"available\": true\n}",
  "done": true,
  "done_reason": "stop",
  "context": [1, 2, 3],
  "total_duration": 1075509083,
  "load_duration": 567678166,
  "prompt_eval_count": 28,
  "prompt_eval_duration": 236000000,
  "eval_count": 16,
  "eval_duration": 269000000
}

請求（JSON 模式）

[!重要] 當 format 設置為 json 時，輸出將始終是格式良好的 JSON 對象。重要的是還要指示模型以 JSON 格式回應。

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What color is the sky at different times of the day? Respond using JSON",
  "format": "json",
  "stream": false
}'

回應

{
  "model": "llama3.2",
  "created_at": "2023-11-09T21:07:55.186497Z",
  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 4648158584,
  "load_duration": 4071084,
  "prompt_eval_count": 36,
  "prompt_eval_duration": 439038000,
  "eval_count": 180,
  "eval_duration": 4196918000
}

response 的值將是一個包含類似 JSON 的字符串：

{
  "morning": {
    "color": "blue"
  },
  "noon": {
    "color": "blue-gray"
  },
  "afternoon": {
    "color": "warm gray"
  },
  "evening": {
    "color": "orange"
  }
}

請求（帶有圖像）

要向多模態模型（如 llava 或 bakllava）提交圖像，請提供 base64 編碼的 images 列表：

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llava",
  "prompt":"What is in this picture?",
  "stream": false,
  "images": ["iVBORw0KGgoAAA...(skipped)..."]
}'

回應

{
  "model": "llava",
  "created_at": "2023-11-03T15:36:02.583064Z",
  "response": "A happy cartoon character, which is cute and cheerful.",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 2938432250,
  "load_duration": 2559292,
  "prompt_eval_count": 1,
  "prompt_eval_duration": 2195557000,
  "eval_count": 44,
  "eval_duration": 736432000
}

請求（原始模式）

在某些情況下，您可能希望繞過模板系統並提供完整的提示。在這種情況下，您可以使用 raw 參數來禁用模板。還要注意，原始模式不會返回上下文。

請求

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "[INST] why is the sky blue? [/INST]",
  "raw": true,
  "stream": false
}'

請求（可重現的輸出）

要獲得可重現的輸出，請將 seed 設置為一個數字：

請求

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "為什麼天空是藍色的？",
  "options": {
    "seed": 123
  }
}'

回應

{
  "model": "mistral",
  "created_at": "2023-11-03T15:36:02.583064Z",
  "response": " 天空看起來是藍色的，因為一種叫做瑞利散射的現象。",
  "done": true,
  "total_duration": 8493852375,
  "load_duration": 6589624375,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 119039000,
  "eval_count": 110,
  "eval_duration": 1779061000
}

生成請求（帶有選項）

如果您希望在運行時設置模型的自定義選項，而不是在 Modelfile 中設置，可以使用 options 參數。此範例設置了所有可用選項，但您可以單獨設置其中任何一個，並省略不想覆蓋的選項。

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "為什麼天空是藍色的？",
  "stream": false,
  "options": {
    "num_keep": 5,
    "seed": 42,
    "num_predict": 100,
    "top_k": 20,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 0.7,
    "repeat_last_n": 33,
    "temperature": 0.8,
    "repeat_penalty": 1.2,
    "presence_penalty": 1.5,
    "frequency_penalty": 1.0,
    "mirostat": 1,
    "mirostat_tau": 0.8,
    "mirostat_eta": 0.6,
    "penalize_newline": true,
    "stop": ["\n", "user:"],
    "numa": false,
    "num_ctx": 1024,
    "num_batch": 2,
    "num_gpu": 1,
    "main_gpu": 0,
    "low_vram": false,
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
    "num_thread": 8
  }
}'

回應

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "天空是藍色的，因為它是天空的顏色。",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": 4935886791,
  "load_duration": 534986708,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 107345000,
  "eval_count": 237,
  "eval_duration": 4289432000
}

加載模型

如果提供了空提示，模型將被加載到內存中。

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2"
}'

回應

返回一個 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2023-12-18T19:52:07.071755Z",
  "response": "",
  "done": true
}

卸載模型

如果提供了空提示且 keep_alive 參數設置為 0，則模型將從內存中卸載。

請求

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "keep_alive": 0
}'

回應

返回一個 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2024-09-12T03:54:03.516566Z",
  "response": "",
  "done": true,
  "done_reason": "unload"
}

生成聊天補全

POST /api/chat

使用提供的模型生成聊天中的下一條消息。這是一個流式傳輸端點，因此會有一系列響應。可以通過 "stream": false 禁用流式傳輸。最終的響應對象將包括請求的統計數據和其他數據。

參數

model：（必需）模型名稱
messages：聊天消息，可以用來保持聊天記憶
tools：模型支持的工具列表，以 JSON 格式提供

message 對象具有以下字段：

role：消息的角色，可以是 system、user、assistant 或 tool
content：消息的內容
images（可選）：消息中包含的圖像列表（適用於多模態模型，如 llava）
tool_calls（可選）：模型希望使用的工具列表，以 JSON 格式提供

高級參數（可選）：

format：返回響應的格式。格式可以是 json 或 JSON schema
options：其他模型參數，列在 Modelfile 文檔中，例如 temperature
stream：如果設置為 false，響應將作為單個響應對象返回，而不是一系列對象
keep_alive：控制請求後模型保持加載在內存中的時間（默認為 5m）

結構化輸出

通過在 format 參數中提供 JSON schema 支援結構化輸出。模型將生成符合該 schema 的回應。請參見下方的結構化輸出範例。

範例

聊天請求（流式傳輸）

請求

發送一條聊天消息，並接收流式傳輸的回應。

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "為什麼天空是藍色的？"
    }
  ]
}'

回應

返回一系列 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "message": {
    "role": "assistant",
    "content": "天空",
    "images": null
  },
  "done": false
}

流中的最終回應：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "done": true,
  "total_duration": 4883583458,
  "load_duration": 1334875,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 342546000,
  "eval_count": 282,
  "eval_duration": 4535599000
}

聊天請求（無流式傳輸）

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "為什麼天空是藍色的？"
    }
  ],
  "stream": false
}'

回應

{
  "model": "llama3.2",
  "created_at": "2023-12-12T14:13:43.416799Z",
  "message": {
    "role": "assistant",
    "content": "你好！今天你怎麼樣？"
  },
  "done": true,
  "total_duration": 5191566416,
  "load_duration": 2154458,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 383809000,
  "eval_count": 298,
  "eval_duration": 4799921000
}

聊天請求（結構化輸出）

請求

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Ollama 22 歲，忙於拯救世界。返回一個包含年齡和可用性的 JSON 對象。"}],
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "age": {
        "type": "integer"
      },
      "available": {
        "type": "boolean"
      }
    },
    "required": [
      "age",
      "available"
    ]
  },
  "options": {
    "temperature": 0
  }
}'

回應

{
  "model": "llama3.1",
  "created_at": "2024-12-06T00:46:58.265747Z",
  "message": {
    "role": "assistant",
    "content": "{\"age\": 22, \"available\": false}"
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 2254970291,
  "load_duration": 574751416,
  "prompt_eval_count": 34,
  "prompt_eval_duration": 1502000000,
  "eval_count": 12,
  "eval_duration": 175000000
}

聊天請求（帶有歷史記錄）

發送帶有對話歷史記錄的聊天消息。您可以使用相同的方法來啟動對話，使用多次提示或連鎖思維提示。

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "為什麼天空是藍色的？"
    },
    {
      "role": "assistant",
      "content": "由於瑞利散射。"
    },
    {
      "role": "user",
      "content": "這與米氏散射有何不同？"
    }
  ]
}'

回應

返回一系列 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "message": {
    "role": "assistant",
    "content": "The"
  },
  "done": false
}

最終回應：

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "done": true,
  "total_duration": 8113331500,
  "load_duration": 6396458,
  "prompt_eval_count": 61,
  "prompt_eval_duration": 398801000,
  "eval_count": 468,
  "eval_duration": 7701267000
}

聊天請求（帶有圖像）

請求

發送帶有圖像的聊天消息。圖像應以數組形式提供，每個圖像均以 Base64 編碼。

curl http://localhost:11434/api/chat -d '{
  "model": "llava",
  "messages": [
    {
      "role": "user",
      "content": "這張圖片中有什麼？",
      "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAA...(skipped)..."]
    }
  ]
}'

回應

{
  "model": "llava",
  "created_at": "2023-12-13T22:42:50.203334Z",
  "message": {
    "role": "assistant",
    "content": " 這張圖片中有一個可愛的小豬，表情有些生氣。它穿著一件帶有心形圖案的衣服，正在揮手。這似乎是一個繪畫或素描項目的一部分。",
    "images": null
  },
  "done": true,
  "total_duration": 1668506709,
  "load_duration": 1986209,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 359682000,
  "eval_count": 83,
  "eval_duration": 1303285000
}

聊天請求（可重現的輸出）

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  }
}'

回應

{
  "model": "llama3.2",
  "created_at": "2023-12-12T14:13:43.416799Z",
  "message": {
    "role": "assistant",
    "content": "Hello! How are you today?"
  },
  "done": true,
  "total_duration": 5191566416,
  "load_duration": 2154458,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 383809000,
  "eval_count": 298,
  "eval_duration": 4799921000
}

聊天請求（帶有工具）

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather today in Paris?"
    }
  ],
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ]
}'

回應

{
  "model": "llama3.2",
  "created_at": "2024-07-22T20:33:28.123648Z",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_current_weather",
          "arguments": {
            "format": "celsius",
            "location": "Paris, FR"
          }
        }
      }
    ]
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 885095291,
  "load_duration": 3753500,
  "prompt_eval_count": 122,
  "prompt_eval_duration": 328493000,
  "eval_count": 33,
  "eval_duration": 552222000
}

加載模型

如果消息數組為空，模型將被加載到內存中。

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": []
}'

回應

{
  "model": "llama3.2",
  "created_at": "2024-09-12T21:17:29.110811Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done_reason": "load",
  "done": true
}

卸載模型

如果消息數組為空且 keep_alive 參數設置為 0，則模型將從內存中卸載。

請求

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [],
  "keep_alive": 0
}'

回應

返回一個 JSON 對象：

{
  "model": "llama3.2",
  "created_at": "2024-09-12T21:33:17.547535Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done_reason": "unload",
  "done": true
}

創建模型

POST /api/create

從以下來源創建模型：

另一個模型；
safetensors 目錄；或
GGUF 文件。

如果您是從 safetensors 目錄或 GGUF 文件創建模型，您必須為每個文件創建一個 blob，然後在 files 字段中使用與每個 blob 關聯的文件名和 SHA256 摘要。

參數

model：要創建的模型名稱
from：（可選）要從中創建新模型的現有模型名稱
files：（可選）文件名到 blob 的 SHA256 摘要的字典，用於創建模型
adapters：（可選）LORA 適配器的文件名到 SHA256 摘要的字典
template：（可選）模型的提示模板
license：（可選）包含模型許可證的字符串或字符串列表
system：（可選）包含模型系統提示的字符串
parameters：（可選）模型的參數字典（參見 Modelfile 了解參數列表）
messages：（可選）用於創建對話的消息對象列表
stream：（可選）如果設置為 false，響應將作為單個響應對象返回，而不是一系列對象
quantize（可選）：量化非量化（例如 float16）模型

量化類型

類型	推薦
q2_K
q3_K_L
q3_K_M
q3_K_S
q4_0
q4_1
q4_K_M	*
q4_K_S
q5_0
q5_1
q5_K_M
q5_K_S
q6_K
q8_0	*

範例

創建新模型

從現有模型創建新模型。

請求

curl http://localhost:11434/api/create -d '{
  "model": "mario",
  "from": "llama3.2",
  "system": "You are Mario from Super Mario Bros."
}'

回應

返回一系列 JSON 對象：

{"status":"reading model metadata"}
{"status":"creating system layer"}
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
{"status":"writing manifest"}
{"status":"success"}

量化模型

量化非量化模型。

請求

curl http://localhost:11434/api/create -d '{
  "model": "llama3.1:quantized",
  "from": "llama3.1:8b-instruct-fp16",
  "quantize": "q4_K_M"
}'

回應

返回一系列 JSON 對象：

{"status":"quantizing F16 model to Q4_K_M"}
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
{"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}
{"status":"writing manifest"}
{"status":"success"}

從 GGUF 創建模型

從 GGUF 文件創建模型。files 參數應包含您希望使用的 GGUF 文件的文件名和 SHA256 摘要。在調用此 API 之前，請使用 /api/blobs/:digest 將 GGUF 文件推送到服務器。

請求

curl http://localhost:11434/api/create -d '{
  "model": "my-gguf-model",
  "files": {
    "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
  }
}'

回應

返回一系列 JSON 對象：

{"status":"parsing GGUF"}
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
{"status":"writing manifest"}
{"status":"success"}

從 Safetensors 目錄創建模型

files 參數應包括 safetensors 模型的文件字典，其中包括每個文件的文件名和 SHA256 摘要。在調用此 API 之前，請使用 /api/blobs/:digest 將每個文件推送到服務器。文件將保留在緩存中，直到 Ollama 服務器重新啟動。

請求

curl http://localhost:11434/api/create -d '{
  "model": "fred",
  "files": {
    "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
    "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
    "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
    "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
    "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
    "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
  }
}'

回應

返回一系列 JSON 對象：

{"status":"converting model"}
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
{"status":"using autodetected template llama3-instruct"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"writing manifest"}
{"status":"success"}

檢查 Blob 是否存在

HEAD /api/blobs/:digest

確保用於創建模型的文件 blob（大型二進制對象）存在於服務器上。這會檢查您的 Ollama 服務器，而不是 ollama.com。

查詢參數

digest：blob 的 SHA256 摘要

範例

請求

curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

回應

如果 blob 存在，返回 200 OK；如果不存在，返回 404 Not Found。

推送 Blob

POST /api/blobs/:digest

將文件推送到 Ollama 服務器以創建 “blob”（大型二進制對象）。

查詢參數

digest：文件的預期 SHA256 摘要

範例

請求

curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

回應

如果 blob 成功創建，返回 201 Created；如果摘要不符合預期，返回 400 Bad Request。

列出本地模型

GET /api/tags

列出本地可用的模型。

範例

請求

curl http://localhost:11434/api/tags

回應

返回一個 JSON 對象。

{
  "models": [
    {
      "name": "codellama:13b",
      "modified_at": "2023-11-04T14:56:49.277302595-07:00",
      "size": 7365960935,
      "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "13B",
        "quantization_level": "Q4_0"
      }
    },
    {
      "name": "llama3:latest",
      "modified_at": "2023-12-07T09:32:18.757212583-08:00",
      "size": 3825819519,
      "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

顯示模型信息

POST /api/show

顯示模型的詳細信息，包括詳細資料、模型文件、模板、參數、許可證、系統提示。

參數

model：要顯示的模型名稱
verbose：（可選）如果設置為 true，將返回詳細響應字段的完整數據

範例

請求

curl http://localhost:11434/api/show -d '{
  "model": "llama3.2"
}'

回應

{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
  "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",
  "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "llama",
    "families": ["llama"],
    "parameter_size": "8.0B",
    "quantization_level": "Q4_0"
  },
  "model_info": {
    "general.architecture": "llama",
    "general.file_type": 2,
    "general.parameter_count": 8030261248,
    "general.quantization_version": 2,
    "llama.attention.head_count": 32,
    "llama.attention.head_count_kv": 8,
    "llama.attention.layer_norm_rms_epsilon": 0.00001,
    "llama.block_count": 32,
    "llama.context_length": 8192,
    "llama.embedding_length": 4096,
    "llama.feed_forward_length": 14336,
    "llama.rope.dimension_count": 128,
    "llama.rope.freq_base": 500000,
    "llama.vocab_size": 128256,
    "tokenizer.ggml.bos_token_id": 128000,
    "tokenizer.ggml.eos_token_id": 128009,
    "tokenizer.ggml.merges": [], // 如果 `verbose=true`，將填充
    "tokenizer.ggml.model": "gpt2",
    "tokenizer.ggml.pre": "llama-bpe",
    "tokenizer.ggml.token_type": [], // 如果 `verbose=true`，將填充
    "tokenizer.ggml.tokens": [] // 如果 `verbose=true`，將填充
  }
}

複製模型

POST /api/copy

複製模型。從現有模型創建一個新名稱的模型。

範例

請求

curl http://localhost:11434/api/copy -d '{
  "source": "llama3.2",
  "destination": "llama3-backup"
}'

回應

如果成功，返回 200 OK；如果源模型不存在，返回 404 Not Found。

刪除模型

DELETE /api/delete

刪除模型及其數據。

參數

model：要刪除的模型名稱

範例

請求

curl -X DELETE http://localhost:11434/api/delete -d '{
  "model": "llama3:13b"
}'

回應

如果成功，返回 200 OK；如果要刪除的模型不存在，返回 404 Not Found。

拉取模型

POST /api/pull

從 ollama 庫下載模型。取消的拉取操作將從中斷處繼續，多次調用將共享相同的下載進度。

參數

model：要拉取的模型名稱
insecure：（可選）允許不安全的連接到庫。僅在開發期間從自己的庫拉取時使用。
stream：（可選）如果設置為 false，響應將作為單個響應對象返回，而不是一系列對象

範例

請求

curl http://localhost:11434/api/pull -d '{
  "model": "llama3.2"
}'

回應

如果未指定 stream 或設置為 true，將返回一系列 JSON 對象：

第一個對象是清單：

{
  "status": "pulling manifest"
}

然後是一系列下載響應。在任何下載完成之前，可能不會包含 completed 鍵。要下載的文件數量取決於清單中指定的層數。

{
  "status": "downloading digestname",
  "digest": "digestname",
  "total": 2142590208,
  "completed": 241970
}

所有文件下載完成後，最終的響應是：

{
    "status": "verifying sha256 digest"
}
{
    "status": "writing manifest"
}
{
    "status": "removing any unused layers"
}
{
    "status": "success"
}

如果 stream 設置為 false，則響應是一個單一的 JSON 對象：

{
  "status": "success"
}

推送模型

POST /api/push

將模型上傳到模型庫。需要先註冊 ollama.ai 並添加公鑰。

參數

model：要推送的模型名稱，格式為 <namespace>/<model>:<tag>
insecure：（可選）允許不安全的連接到庫。僅在開發期間推送到自己的庫時使用。
stream：（可選）如果設置為 false，響應將作為單個響應對象返回，而不是一系列對象

範例

請求

curl http://localhost:11434/api/push -d '{
  "model": "mattw/pygmalion:latest"
}'

回應

如果未指定 stream 或設置為 true，將返回一系列 JSON 對象：

{ "status": "retrieving manifest" }

然後：

{
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}

接著是一系列上傳響應：

{
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}

最後，當上傳完成時：

{"status":"pushing manifest"}
{"status":"success"}

如果 stream 設置為 false，則響應是一個單一的 JSON 對象：

{ "status": "success" }

生成嵌入

POST /api/embed

從模型生成嵌入

參數

model：生成嵌入的模型名稱
input：要生成嵌入的文本或文本列表

高級參數：

truncate：將每個輸入的末尾截斷以適應上下文長度。如果設置為 false 且超過上下文長度，則返回錯誤。默認為 true
options：其他模型參數，列在 Modelfile 文檔中，例如 temperature
keep_alive：控制請求後模型保持加載在內存中的時間（默認為 5m）

範例

請求

curl http://localhost:11434/api/embed -d '{
  "model": "all-minilm",
  "input": "Why is the sky blue?"
}'

回應

{
  "model": "all-minilm",
  "embeddings": [
    [
      0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
      0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
    ]
  ],
  "total_duration": 14143917,
  "load_duration": 1019500,
  "prompt_eval_count": 8
}

請求（多個輸入）

curl http://localhost:11434/api/embed -d '{
  "model": "all-minilm",
  "input": ["Why is the sky blue?", "Why is the grass green?"]
}'

回應

{
  "model": "all-minilm",
  "embeddings": [
    [
      0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
      0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
    ],
    [
      -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,
      0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481
    ]
  ]
}

列出運行中的模型

GET /api/ps

列出當前加載到內存中的模型。

範例

請求

curl http://localhost:11434/api/ps

回應

返回一個 JSON 對象。

{
  "models": [
    {
      "name": "mistral:latest",
      "model": "mistral:latest",
      "size": 5137025024,
      "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": ["llama"],
        "parameter_size": "7.2B",
        "quantization_level": "Q4_0"
      },
      "expires_at": "2024-06-04T14:38:31.83753-07:00",
      "size_vram": 5137025024
    }
  ]
}

生成嵌入

注意：此端點已被 /api/embed 取代

POST /api/embeddings

從模型生成嵌入

參數

model：生成嵌入的模型名稱
prompt：生成嵌入的文本

高級參數：

options：其他模型參數，列在 Modelfile 文檔中，例如 temperature
keep_alive：控制請求後模型保持加載在內存中的時間（默認為 5m）

範例

請求

curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm",
  "prompt": "這是一篇關於ollama的文章..."
}'

回應

{
  "embedding": [
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074,
    -0.2916173040866852, -0.8924556970596313, 0.8785552978515625,
    -0.34576427936553955, 0.5742510557174683, -0.04222835972905159,
    -0.137906014919281
  ]
}

版本

GET /api/version

檢索 Ollama 版本

範例

請求

curl http://localhost:11434/api/version

回應

{
  "version": "0.5.1"
}

Ollama API Document 中文翻譯

API base on ad22ace439eb3fab7230134e56bb6276a78347e4

Endpoints

約定

模型名稱

持續時間

stream 流式響應

生成補全

參數

結構化輸出

JSON 模式

範例

生成請求（流式傳輸）

請求

回應

請求（無流式傳輸）

請求

回應

請求（帶有後綴）

請求

回應

請求（結構化輸出）

請求

回應

請求（JSON 模式）

請求

回應

請求（帶有圖像）

請求

回應

請求（原始模式）

請求

請求（可重現的輸出）

請求

回應

生成請求（帶有選項）

請求

回應

加載模型

請求

回應

卸載模型

請求

回應

生成聊天補全

參數

結構化輸出

範例

聊天請求（流式傳輸）

請求

回應

聊天請求（無流式傳輸）

請求

回應

聊天請求（結構化輸出）

請求

回應

聊天請求（帶有歷史記錄）

請求

回應

聊天請求（帶有圖像）

請求

回應

聊天請求（可重現的輸出）

請求

回應

聊天請求（帶有工具）

請求

回應

加載模型

請求

回應

卸載模型

請求

回應

創建模型

參數

量化類型

範例

創建新模型