langchain调用多模态模型

2. 模型定义3. 图片数据编码4. 调用模型返回结果5. 调用模型（使用图片链接）返回结果：千文视觉模型不支持图片链接，所以会报错6. 读取多张图片并编码为base647. 使用多张图片数据调用模型8. 使用工具结果显示，千文视觉模型不支持工具调用content=[{“type”: “text”, “text”: “describe the weather in this image”},{“

qq_41472205

1318人浏览 · 2024-12-20 00:18:26

qq_41472205 · 2024-12-20 00:18:26 发布

1. 示例图片链接

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

在这里插入图片描述

2. 模型定义

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    temperature=0,
    model="qwen-vl-max",
    openai_api_base="https://dashscope.aliyuncs.com/compatible-mode/v1", 
    openai_api_key="your api key"
)

3. 图片数据编码

import base64

import httpx

image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

4. 调用模型

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
response = model.invoke([message])
print(response.content)

返回结果

The weather in the image appears to be clear and sunny. The sky is mostly blue with some scattered clouds, indicating a pleasant day. The sunlight is bright, casting shadows and highlighting the vibrant green grass and wooden path. The overall atmosphere suggests a calm and serene outdoor setting, likely during late morning or early afternoon.

5. 调用模型（使用图片链接）

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])
print(response.content)

返回结果： 千文视觉模型不支持图片链接，所以会报错

---------------------------------------------------------------------------

BadRequestError                           Traceback (most recent call last)

Cell In[8], line 7
      1 message = HumanMessage(
      2     content=[
      3         {"type": "text", "text": "describe the weather in this image"},
      4         {"type": "image_url", "image_url": {"url": image_url}},
      5     ],
      6 )
----> 7 response = model.invoke([message])
      8 print(response.content)


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_core\language_models\chat_models.py:286, in BaseChatModel.invoke(self, input, config, stop, **kwargs)
    275 def invoke(
    276     self,
    277     input: LanguageModelInput,
   (...)
    281     **kwargs: Any,
    282 ) -> BaseMessage:
    283     config = ensure_config(config)
    284     return cast(
    285         ChatGeneration,
--> 286         self.generate_prompt(
    287             [self._convert_input(input)],
    288             stop=stop,
    289             callbacks=config.get("callbacks"),
    290             tags=config.get("tags"),
    291             metadata=config.get("metadata"),
    292             run_name=config.get("run_name"),
    293             run_id=config.pop("run_id", None),
    294             **kwargs,
    295         ).generations[0][0],
    296     ).message


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_core\language_models\chat_models.py:786, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    778 def generate_prompt(
    779     self,
    780     prompts: list[PromptValue],
   (...)
    783     **kwargs: Any,
    784 ) -> LLMResult:
    785     prompt_messages = [p.to_messages() for p in prompts]
--> 786     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_core\language_models\chat_models.py:643, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    641         if run_managers:
    642             run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 643         raise e
    644 flattened_outputs = [
    645     LLMResult(generations=[res.generations], llm_output=res.llm_output)  # type: ignore[list-item]
    646     for res in results
    647 ]
    648 llm_output = self._combine_llm_outputs([res.llm_output for res in results])


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_core\language_models\chat_models.py:633, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    630 for i, m in enumerate(messages):
    631     try:
    632         results.append(
--> 633             self._generate_with_cache(
    634                 m,
    635                 stop=stop,
    636                 run_manager=run_managers[i] if run_managers else None,
    637                 **kwargs,
    638             )
    639         )
    640     except BaseException as e:
    641         if run_managers:


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_core\language_models\chat_models.py:851, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
    849 else:
    850     if inspect.signature(self._generate).parameters.get("run_manager"):
--> 851         result = self._generate(
    852             messages, stop=stop, run_manager=run_manager, **kwargs
    853         )
    854     else:
    855         result = self._generate(messages, stop=stop, **kwargs)


File d:\soft\anaconda\envs\langchain\Lib\site-packages\langchain_openai\chat_models\base.py:683, in BaseChatOpenAI._generate(self, messages, stop, run_manager, **kwargs)
    681     generation_info = {"headers": dict(raw_response.headers)}
    682 else:
--> 683     response = self.client.create(**payload)
    684 return self._create_chat_result(response, generation_info)


File d:\soft\anaconda\envs\langchain\Lib\site-packages\openai\_utils\_utils.py:274, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs)
    272             msg = f"Missing required argument: {quote(missing[0])}"
    273     raise TypeError(msg)
--> 274 return func(*args, **kwargs)


File d:\soft\anaconda\envs\langchain\Lib\site-packages\openai\resources\chat\completions.py:742, in Completions.create(self, messages, model, frequency_penalty, function_call, functions, logit_bias, logprobs, max_completion_tokens, max_tokens, metadata, n, parallel_tool_calls, presence_penalty, response_format, seed, service_tier, stop, store, stream, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
    704 @required_args(["messages", "model"], ["messages", "model", "stream"])
    705 def create(
    706     self,
   (...)
    739     timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
    740 ) -> ChatCompletion | Stream[ChatCompletionChunk]:
    741     validate_response_format(response_format)
--> 742     return self._post(
    743         "/chat/completions",
    744         body=maybe_transform(
    745             {
    746                 "messages": messages,
    747                 "model": model,
    748                 "frequency_penalty": frequency_penalty,
    749                 "function_call": function_call,
    750                 "functions": functions,
    751                 "logit_bias": logit_bias,
    752                 "logprobs": logprobs,
    753                 "max_completion_tokens": max_completion_tokens,
    754                 "max_tokens": max_tokens,
    755                 "metadata": metadata,
    756                 "n": n,
    757                 "parallel_tool_calls": parallel_tool_calls,
    758                 "presence_penalty": presence_penalty,
    759                 "response_format": response_format,
    760                 "seed": seed,
    761                 "service_tier": service_tier,
    762                 "stop": stop,
    763                 "store": store,
    764                 "stream": stream,
    765                 "stream_options": stream_options,
    766                 "temperature": temperature,
    767                 "tool_choice": tool_choice,
    768                 "tools": tools,
    769                 "top_logprobs": top_logprobs,
    770                 "top_p": top_p,
    771                 "user": user,
    772             },
    773             completion_create_params.CompletionCreateParams,
    774         ),
    775         options=make_request_options(
    776             extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
    777         ),
    778         cast_to=ChatCompletion,
    779         stream=stream or False,
    780         stream_cls=Stream[ChatCompletionChunk],
    781     )


File d:\soft\anaconda\envs\langchain\Lib\site-packages\openai\_base_client.py:1277, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1263 def post(
   1264     self,
   1265     path: str,
   (...)
   1272     stream_cls: type[_StreamT] | None = None,
   1273 ) -> ResponseT | _StreamT:
   1274     opts = FinalRequestOptions.construct(
   1275         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1276     )
-> 1277     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))


File d:\soft\anaconda\envs\langchain\Lib\site-packages\openai\_base_client.py:954, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    951 else:
    952     retries_taken = 0
--> 954 return self._request(
    955     cast_to=cast_to,
    956     options=options,
    957     stream=stream,
    958     stream_cls=stream_cls,
    959     retries_taken=retries_taken,
    960 )


File d:\soft\anaconda\envs\langchain\Lib\site-packages\openai\_base_client.py:1058, in SyncAPIClient._request(self, cast_to, options, retries_taken, stream, stream_cls)
   1055         err.response.read()
   1057     log.debug("Re-raising status error")
-> 1058     raise self._make_status_error_from_response(err.response) from None
   1060 return self._process_response(
   1061     cast_to=cast_to,
   1062     options=options,
   (...)
   1066     retries_taken=retries_taken,
   1067 )


BadRequestError: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'param': None, 'message': '<400> InternalError.Algo.InvalidParameter: Download multimodal file timed out', 'type': 'invalid_request_error'}, 'id': 'chatcmpl-b4aad2b6-bcd4-9e75-a2c0-010087d3bb0d', 'request_id': 'b4aad2b6-bcd4-9e75-a2c0-010087d3bb0d'}

6. 读取多张图片并编码为base64

import os
import base64
from IPython.display import Image, display

# 设置你的图片文件夹路径
folder_path = 'picture'
image_data = []
# 获取文件夹中的所有图片文件
image_files = [f for f in os.listdir(folder_path) if f.endswith(('jpg', 'png', 'jpeg'))]

# 读取并展示每张图片，同时进行 base64 编码
for image_file in image_files:
    image_path = os.path.join(folder_path, image_file)
    
    # 展示图片
    display(Image(filename=image_path))
    
    # 读取并编码图片为base64
    with open(image_path, 'rb') as image:
        encoded_image = base64.b64encode(image.read()).decode('utf-8')
    
    # 显示图片文件名和base64编码的前部分
    print(f"File: {image_file}")
    
    image_data.append(encoded_image)

请添加图片描述

File: ab2aeb29dbb00e9980cfbe628c72bc50.jpeg

请添加图片描述

File: b5522937967b7bf1e6ca90acb9200371.jpeg

7. 使用多张图片数据调用模型

message = HumanMessage(
    content=[
        {"type": "text", "text": "are these two images the same?"},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data[0]}"}},
        {"type": "image_url", "image_url": {"url":  f"data:image/jpeg;base64,{image_data[1]}"}},
    ],
)
response = model.invoke([message])
print(response.content)

No, these two images are not the same. They depict different animated characters with different hairstyles and clothing.

8. 使用工具

from typing import Literal

from langchain_core.tools import tool


@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
    """Describe the weather"""
    pass


model_with_tools = model.bind_tools([weather_tool])

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
response = model_with_tools.invoke([message])
print(response.tool_calls)