使用GPT-4V为电商产品生成标题

Inspired by Lilian Weng’s Lil’Log (https://lilianweng.github.io/), I will document my learnings and understandings frequently on my website. As a data analyst at aosom.com, an online shopping company, I primarily focus on topics related to this field including LLM product description writing, RAG systems for info retrieval, machine learning algorithms, and website data analysis.

今天精读的文章是来自OpenAI的Katia Gil Guzman，链接：https://cookbook.openai.com/examples/tag_caption_images_with_gpt4v.

这篇文章主要介绍了用GPT-4V进行产品的关键词和标题的生成，GPT-4V是OpenAI在2023年9月发布的一个能让用户使用GPT-4分析图像输入的模型。数据用的是亚马逊的家具数据集：

amazon_furniture_dataset.csv963.2KB

首先，需要import需要的库以及加载数据：

# Install dependencies if needed
%pip install openai
%pip install scikit-learn

from IPython.display import Image, display
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from openai import OpenAI

# Initializing OpenAI client - see https://platform.openai.com/docs/quickstart?context=python
client = OpenAI()

# Loading dataset
dataset_path =  "data/amazon_furniture_dataset.csv"
df = pd.read_csv(dataset_path)

关键词生成

作者首先用了GPT-4V最基础的Completion模块生成图片的关键词：

system_prompt = '''
    You are an agent specialized in tagging images of furniture items, decorative items, or furnishings with relevant keywords that could be used to search for these items on a marketplace.
    
    You will be provided with an image and the title of the item that is depicted in the image, and your goal is to extract keywords for only the item specified. 
    
    Keywords should be concise and in lower case. 
    
    Keywords can describe things like:
    - Item type e.g. 'sofa bed', 'chair', 'desk', 'plant'
    - Item material e.g. 'wood', 'metal', 'fabric'
    - Item style e.g. 'scandinavian', 'vintage', 'industrial'
    - Item color e.g. 'red', 'blue', 'white'
    
    Only deduce material, style or color keywords when it is obvious that they make the item depicted in the image stand out.

    Return keywords in the format of an array of strings, like this:
    ['desk', 'industrial', 'metal']
    
'''

这篇提示词让GPT从图片和标题中生成用户搜索关键词(search keywords)，以['desk', 'industrial', 'metal']的格式。

作者引入了利用text embeddings合并相近词的算法：

这个算法能把上面GPT生成的关键词与词库进行对比，使用cosine similarity（余弦相似度）。Cosine similarity通过测量两个向量的夹角的余弦值来度量它们之间的相似性。以我个人经验，关键词词库可以使用Google Ads中的转化率高的用户热搜词。

作者先是把词库的关键词用OpenAI的text-embedding-3-large转化为向量，然后通过比较embedding间的cosine similarity进行同义词合并：

def compare_keyword(keyword):
    embedded_value = get_embedding(keyword)
    df_keywords['similarity'] = df_keywords['embedding'].apply(lambda x: cosine_similarity(np.array(x).reshape(1,-1), np.array(embedded_value).reshape(1, -1)))
    most_similar = df_keywords.sort_values('similarity', ascending=False).iloc[0]
    return most_similar

def replace_keyword(keyword, threshold = 0.6):
    most_similar = compare_keyword(keyword)
    if most_similar['similarity'] > threshold:
        print(f"Replacing '{keyword}' with existing keyword: '{most_similar['keyword']}'")
        return most_similar['keyword']
    return keyword

在这里作者把阈值设置成了0.6, 也就是说两个关键词的余弦相似度大于0.6时，生成的词会被替换为词库中的标准搜索词。

给商品起标题

作者接下来用这一段提示词让GPT-4V生成图片描述：

describe_system_prompt = '''
    You are a system generating descriptions for furniture items, decorative items, or furnishings on an e-commerce website.
    Provided with an image and a title, you will describe the main item that you see in the image, giving details but staying concise.
    You can describe unambiguously what the item is and its material, color, and style if clearly identifiable.
    If there are multiple items depicted, refer to the title to understand which item you should describe.
    '''

这是一段生成的描述：

'''
This is a free-standing shoe rack featuring a multi-layer design, constructed from metal for durability. The rack is finished in a clean white color, which gives it a modern and versatile look, suitable for various home decor styles. It includes several horizontal shelves dedicated to organizing shoes, providing ample space for multiple pairs.

Additionally, the rack is equipped with 8 double hooks, which are integrated into the frame above the shoe shelves. These hooks offer extra functionality, allowing for the hanging of accessories such as hats, scarves, or bags. The design is space-efficient and ideal for placement in living rooms, bathrooms, hallways, or entryways, where it can serve as a practical storage solution while contributing to the tidiness and aesthetic of the space.
'''

生成完长描述后，使用以下few-shot的prompt生成产品标题。Few-shot在机器学习中是指用非常少量的标注样本来训练模型识别新类别或执行新任务的能力。在这里，作者给出了三个例子：

caption_system_prompt = '''
Your goal is to generate short, descriptive captions for images of furniture items, decorative items, or furnishings based on an image description.
You will be provided with a description of an item image and you will output a caption that captures the most important information about the item.
Your generated caption should be short (1 sentence), and include the most relevant information about the item.
The most important information could be: the type of the item, the style (if mentioned), the material if especially relevant and any distinctive features.
'''

few_shot_examples = [
    {
        "description": "This is a multi-layer metal shoe rack featuring a free-standing design. It has a clean, white finish that gives it a modern and versatile look, suitable for various home decors. The rack includes several horizontal shelves dedicated to organizing shoes, providing ample space for multiple pairs. Above the shoe storage area, there are 8 double hooks arranged in two rows, offering additional functionality for hanging items such as hats, scarves, or bags. The overall structure is sleek and space-saving, making it an ideal choice for placement in living rooms, bathrooms, hallways, or entryways where efficient use of space is essential.",
        "caption": "White metal free-standing shoe rack"
    },
    {
        "description": "The image shows a set of two dining chairs in black. These chairs are upholstered in a leather-like material, giving them a sleek and sophisticated appearance. The design features straight lines with a slight curve at the top of the high backrest, which adds a touch of elegance. The chairs have a simple, vertical stitching detail on the backrest, providing a subtle decorative element. The legs are also black, creating a uniform look that would complement a contemporary dining room setting. The chairs appear to be designed for comfort and style, suitable for both casual and formal dining environments.",
        "caption": "Set of 2 modern black leather dining chairs"
    },
    {
        "description": "This is a square plant repotting mat designed for indoor gardening tasks such as transplanting and changing soil for plants. It measures 26.8 inches by 26.8 inches and is made from a waterproof material, which appears to be a durable, easy-to-clean fabric in a vibrant green color. The edges of the mat are raised with integrated corner loops, likely to keep soil and water contained during gardening activities. The mat is foldable, enhancing its portability, and can be used as a protective surface for various gardening projects, including working with succulents. It's a practical accessory for garden enthusiasts and makes for a thoughtful gift for those who enjoy indoor plant care.",
        "caption": "Waterproof square plant repotting mat"
    }
]

formatted_examples = [[{
    "role": "user",
    "content": ex['description']
},
{
    "role": "assistant", 
    "content": ex['caption']
}]
    for ex in few_shot_examples
]

formatted_examples = [i for ex in formatted_examples for i in ex]

def caption_image(description, model="gpt-4-turbo-preview"):
    messages = formatted_examples
    messages.insert(0, 
        {
            "role": "system",
            "content": caption_system_prompt
        })
    messages.append(
        {
            "role": "user",
            "content": description
        })
    response = client.chat.completions.create(
    model=model,
    temperature=0.2,
    messages=messages
    )

    return response.choices[0].message.content

这是一个生成的例子：

Description 描述

Caption 标题

This is a LOVMOR 30'' Bathroom Vanity Sink Base Cabinet featuring a classic design with a rich brown finish. The cabinet is designed to provide ample storage with three drawers on the left side, offering organized space for bathroom essentials. The drawers are likely to have smooth glides for easy operation. Below the drawers, there is a large cabinet door that opens to reveal additional storage space, suitable for larger items. The paneling on the drawers and door features a raised, framed design, adding a touch of elegance to the overall appearance. This vanity base is versatile and can be used not only in bathrooms but also in kitchens, laundry rooms, and other areas where extra storage is needed. The construction material is not specified, but it appears to be made of wood or a wood-like composite. Please note that the countertop and sink are not included and would need to be purchased separately.

LOVMOR 30'' classic brown bathroom vanity base cabinet with three drawers and additional storage space.

可以看到，作者仅仅给出了三个例子，标题就完美生成为作者想要的长度和格式。