一、为什么要做 SKU 采集在电商精细化运营、比价系统、价格监控、库存同步、智能补货、竞品分析等场景中,“SKU(Stock Keeping Unit)” 是最小粒度、最稳定、最不可再拆分的商品单元。京东把同一 SPU(Standard Product Unit,标准商品)下的不同颜色、尺码、版本、套餐拆成多条 SKU,每条 SKU 具备独立的 id、价格、库存、促销、图片、规格参数。因此,能否“精准”而非“暴力”地拿到 SKU 维度数据,直接决定了后续业务逻辑的准确性。二、京东前端渲染链路的演进
2016 年以前:服务端直出 HTML,页面源码里直接嵌 JSON。
2017-2019 年:开始 React 同构,HTML 中仍保留 window.pageConfig、window.skudata 等全局变量。
2020 以后:大部分页面升级为 Next.js 同构 + JFS(京东前端服务)SSR,首屏 HTML 只剩骨架,真正的 SKU 数据放在异步接口;同时接口加签、加滑块、加 m.jd.com 的协议头,风控显著增强。结论:– 2025 年的今天,想拿到 100% 准确的 SKU,必须“浏览器环境 + 接口逆向”两条腿走路。– 接口逆向的核心是定位“SKU 聚合接口”并“还原签名算法”。三、整体技术方案
浏览器层:Puppeteer/Playwright 模拟真机环境,绕过滑块、滑条、智能验证码。
逆向层:在 DevTools 中把 SKU 接口(getWareBusiness、getSeparateWareStyle、getSizeColor 等)的请求/响应完整抓出来,剥离出 query 参数、cookie、sign、functionId。
纯后端层:用 Python 复现签名算法,脱离浏览器做高并发调用。四、环境准备系统:Ubuntu 22.04 python3.11 -m venv venvsource venv/bin/activatepip install playwright requests loguru pydantic fake-useragentplaywright install chromium五、浏览器层:用 Playwright 拿原始接口file: jd_sku_browser.py import asyncio, re, jsonfrom playwright.async_api import async_playwrightfrom loguru import logger
JD_ITEM_URL = "https://item.jd.com/100012043978.html"
async def run(): async with async_playwright() as pw: iphone_ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1" iphone = pw.devices['iPhone 14 Pro'] browser = await pw.webkit.launch(headless=False, slow_mo=200) page = await browser.new_page(iphone, user_agent=iphone_ua) await page.route("/*", lambda route: route.continue_()) # 全放行 res_list = []
async def handle_res(res):
url = res.url
if "getWareBusiness" in url or "getSeparateWareStyle" in url:
try:
txt = await res.text()
res_list.append({"url": url, "json": json.loads(txt)})
except Exception as e:
logger.warning(e)
page.on("response", handle_res)
await page.goto(JD_ITEM_URL, timeout=30_000)
await page.wait_for_load_state("networkidle")
await asyncio.sleep(3)
await browser.close()
return res_list
if name == "main": print(asyncio.run(run()))运行后会在终端打印出两条核心 JSON:– getWareBusiness:包含价格、库存、促销、Plus 价、秒杀价。– getSeparateWareStyle:包含颜色、尺码、版本、套餐维度及对应的 SKU ID。六、逆向层:剥离出关键参数以 getWareBusiness 为例,浏览器 Network 面板看到的 URL:https://api.m.jd.com/api?appid=item-view&functionId=getWareBusiness&client=m&clientVersion=12.0.0&uuid=163521234567890&t=1724745123456&skuId=100012043978&sign=c5c4c3b2a1...经过断点调试(webpack:// jd-module-sign.js)发现 sign 的算法:sign = md5( "functionId=" + functionId + "&body=" + body + "&uuid=" + uuid + "&client=" + client + "&clientVersion=" + clientVersion + "&t=" + t + "&appid=" + appid + "&token=" + (token || ""))其中 body 是 URL 编码后的 JSON 字符串,token 为空。七、Python 复现签名算法
file: jd_sign.py
import hashlib, time, urllib.parse
def jd_sign(function_id: str, body: dict, uuid: str, client="m", client_version="12.0.0", appid="item-view") -> str: body_str = urllib.parse.quote(json.dumps(body, separators=(",", ":"), ensure_ascii=False)) t = str(int(time.time() * 1000)) raw = ( f"functionId={function_id}" f"&body={body_str}" f"&uuid={uuid}" f"&client={client}" f"&clientVersion={client_version}" f"&t={t}" f"&appid={appid}" "&token=" ) return hashlib.md5(raw.encode()).hexdigest(), t八、纯后端:高并发拉取 SKU 维度数据
file: jd_sku.py
import httpx, json, asyncio, randomfrom jd_sign import jd_signfrom loguru import logger
BASE = "https://api.m.jd.com/api"HEADERS = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1", "Referer": "https://item.m.jd.com/"}
async def fetch_sku(sku_id: str): uuid = "".join(random.choices("0123456789", k=15)) body = {"skuId": sku_id, "catId": "", "areaId": "19_1601_50258_51885"} sign, t = jd_sign("getWareBusiness", body, uuid) params = {
"appid": "item-view", "functionId": "getWareBusiness", "client": "m", "clientVersion": "12.0.0", "uuid": uuid, "t": t, "skuId": sku_id, "sign": sign, "body": json.dumps(body, separators=(",", ":")) } async with httpx.AsyncClient(timeout=10, headers=HEADERS) as cli: r = await cli.get(BASE, params=params) r.raise_for_status() return r.json()
if name == "main": sku = "100012043978" data = asyncio.run(fetch_sku(sku)) print(json.dumps(data, ensure_ascii=False, indent=2))九、结果解析与落地
价格:data["price"]["p"] 为当前售价,data["price"]["op"] 为原价。
库存:data["stock"]["skuState"] 1 为有货,0 为无货;data["stock"]["StockState"] 33 为现货,34 为预订,40 为无货。
促销:data["promotion"] 为数组,包含满减、秒杀、白条分期。
SKU 维度:data["colorSize"] 为颜色/尺码矩阵,每条记录包含 skuId、image、name。示例落地 MySQL 表:CREATE TABLE jd_sku ( id BIGINT PRIMARY KEY, sku_id BIGINT UNIQUE, spu_id BIGINT, name VARCHAR(255), price DECIMAL(10,2), stock TINYINT, spec JSON, utime TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);将脚本封装成定时任务,每分钟批量刷新,即可实现分钟级价格/库存监控。十、风控与合规提示
京东对 UA、Cookie、Sign、滑块、IP 频率均有风控,建议:– 合理并发(QPS<1)– 使用住宅代理轮换,例如:亿牛云– 遵守 robots.txt 及京东开放平台 ToS