WZ — Case · 03
Filed · 2026 / Q2
SF Bay Area · 37.77°N
v.01 · current
Pillar 03 — AI workflow leverage 第三根支柱 —— AI 工作流杠杆

A contribution intelligence layer that turns commits into auditable signal.

把提交记录读成可审计的角色信号。

Engineering output is mostly invisible to non-engineers. I designed a four-phase pipeline that reads commits with Claude, generates contributor profiles, and exposes the methodology so the judgment is auditable, not vibes.

工程产出对非工程人来说几乎是黑箱。我设计了一条四阶段管线:Claude 读 commit、生成贡献者画像、再把方法论本身公开成一页,让判断可被反问,而不是靠感觉。

01CONTEXT背景

Engineering output was invisible to anyone outside engineering.

工程产出对工程之外的人,几乎是黑箱。

I was inside a fast-growing Series A AI company where headcount had crossed the point at which any one person could remember who shipped what. Engineering moved across many repos, in many time zones, on a release cadence that nobody outside the team could narrate.

The consequence was predictable. Performance conversations and recognition decisions kept landing on the wrong axes — visible PR counts, who spoke up in meetings, who happened to sit next to whom. The actual shape of contribution was not on the table, because the table did not have it.

I wanted one artifact that turned the company's real GitHub activity into something a leadership team could read in five minutes — and that a contributor could read about themselves and recognize as fair.

那时我在一家快速扩张的 Series A AI 公司里面。人数早已过了「一个人还能记住谁做了什么」的拐点,工程团队散在很多仓库、很多时区,发布节奏只有工程内部讲得清。

结果也很可预测。绩效讨论和被认可的人,最后落在错的维度上——谁的 PR 数显眼、谁在会议上声音大、谁恰好坐在谁旁边。贡献真正的形状不在桌面上,因为这张桌面根本就没有它。

我想要一份能放在桌面上的东西:让管理层五分钟读完,公司真实的 GitHub 活动到底是什么样;也让任何一位被它描述的工程师读到自己时,能承认这是公平的。

02PROBLEM问题

Commit diffs don't speak human until something reads them carefully.

commit diff 不会自己说人话,得有东西耐心读它。

GitHub will hand you 16 repos and tens of thousands of commits across a few dozen contributors. What it will not hand you is shape. Raw PR counts are a misleading proxy — one small bugfix can outweigh a month of careful refactor work. Reviews and issues matter, often more than authored code, and live in entirely different APIs. Reading commit diffs by hand does not scale past a single repo, let alone sixteen.

The hard problem was never "look at GitHub." The hard problem was turning diffs into substantively-described work in a way that survives audit — where every label on a person traces back to the line of code that produced it, and where the methodology itself can be argued with on its own page.

The system had to do four things at once: read code carefully, attribute it to the right person, aggregate it into a profile, and show its work. Skip any of them and the output is just another dashboard nobody trusts.

GitHub 会给你 16 个仓库、几万条 commit、几十位贡献者。它不会顺手给你的是「形状」。原始 PR 计数是个有误导性的代理:一个小 bugfix 可以盖过一个月认真的重构。Review 和 issue 同样重要,常常比写代码更重要,却散在完全不同的 API 里。靠人手读 diff 撑不过一个仓库,更别说十六个。

真正难的问题从来不是「看看 GitHub」。难的是把 diff 翻译成有实质描述的工作,并且让结论经得起追问——任何一个贴在人身上的标签,都要能回溯到产生它的那段代码;方法论本身也得能在它自己的一页上被反问。

系统要同时做四件事:认真读代码、归属到正确的人、聚合成画像、把自己的工作过程暴露出来。少做一件,产出就只是又一个没人信的看板。

03APPROACH方法

A four-phase pipeline, with the LLM split where it actually pays.

一条四阶段管线,把模型放在真正划算的环节。

Phase 0 — Identity resolution.

Reconcile GitHub handles against the internal roster so every contribution lands on the right person. Engineers use multiple handles, bot accounts show up, and squash merges erase co-authors. Without this layer every downstream phase is noise dressed up as signal, so I treated it as the foundation, not an afterthought.

Phase 1 — PR collection.

Pull merged PRs from each tracked repo via the GitHub API, walk every commit inside them, and attribute work PR-centric: one PR fans out to many commits and resolves to one author chain. This is the shape that actually matches how the team ships, which matters more than raw commit volume.

Phase 1.5 — Reviews and issues.

Supplement PRs with review counts and issue activity through the Search API, sidecar to the main pipeline. Some of the most central engineers on a team review heavily and author lightly — they are invisible to PR-count metrics and conspicuous the moment reviews enter the frame.

Phase 2 — Diff analysis with Claude Haiku.

Run each commit diff through Claude Haiku to extract structured signal per commit: substantiveness, work type (feature / bug / chore / refactor), and auto-generated-code ratio. This is where the actual reading happens. Diffs are cached to Postgres and truncated at 30K characters so a re-run does not re-fetch from GitHub and does not blow up on a single oversized commit.

Phase 3 — Profile generation with Claude Sonnet.

Aggregate Phase 2 per contributor and synthesize: a title-like descriptor, a centrality estimate, a role read, and a short list of key deliverables. Sonnet does not see raw diffs — it sees already-summarized commits. That ordering is the whole point.

Why both models.

Haiku is cheap and runs across many diffs; Sonnet is the synthesis pass over already-summarized work. The split materially controls cost — the cheap model does the wide read, the expensive one does the narrow write. Collapsing them into a single model either inflates spend by an order of magnitude or sacrifices the substance of the per-commit read.

第零阶段 —— 身份归并

把 GitHub 账号和公司花名册对齐,每条贡献都落到正确的人身上。工程师常有多个账号,机器人账号会冒出来,squash 合并还会抹掉共同作者。没有这一层,后面每一阶段都只是被包装得像信号的噪音,所以我把它当作地基,不是补丁。

第一阶段 —— PR 采集

从每个被跟踪的仓库拉已合并的 PR,走完里面的每一条 commit,再以 PR 为中心做归属:一个 PR 展开成很多 commit,再收敛到一条作者链。这才贴合团队真实的交付方式,比单纯比 commit 数量更有意义。

第 1.5 阶段 —— Reviews 与 issues

用 Search API 把 review 计数和 issue 活跃度补上,作为主管线的旁路。一个团队里最关键的人,往往评审得很多、自己写得少——只看 PR 数他们是隐形的,把 review 拉进画面里他们立刻显形。

第二阶段 —— 用 Claude Haiku 读 diff

每一条 commit 的 diff 进 Haiku,抽出结构化信号:实质程度、工作类型(feature / bug / chore / refactor)、自动生成代码比例。真正「读代码」的事情发生在这一步。diff 缓存到 Postgres,截断到 3 万字符以内,重跑不再回 GitHub 取,也不会因为某条超大 commit 把整条管线打挂。

第三阶段 —— 用 Claude Sonnet 生成画像

把第二阶段的结果按贡献者聚合,再合成出画像:一个像称号的描述、一个核心度估计、一段角色判读、一份重点交付清单。Sonnet 看不到原始 diff,它看到的是已经被总结过的 commit。这个顺序就是关键。

为什么两个模型一起用

Haiku 便宜,适合大面积扫;Sonnet 是已经被压缩过的工作之上的合成层。这种分工对成本是实质性的——便宜的模型去做广读,贵的模型负责窄写。把它们合成一个模型,要么花费翻一个数量级,要么牺牲掉每条 commit 被认真读的那一层。

04WHAT I BUILT我做了什么

A pipeline, a portal, and a methodology page that makes the synthesis arguable.

一条管线、一个门户、一页可被反问的方法论。

PIPELINE — IDENTITY → PRS → DIFF READ → PROFILE → PORTAL PHASE 0 Identity handle ↔ roster PHASE 1 PRs merged · PR-centric PHASE 1.5 Reviews + Issues PHASE 2 · LLM Claude Haiku diff read · per-commit PHASE 3 · LLM Claude Sonnet profile synthesis OUTPUT Web portal + /methodology CACHE LAYER — Postgres diff store · re-runs do not re-fetch from GitHub API TRUNCATION — 30K char cap per diff · Phase 2 cost reduced ~60%
Fig. 03 — Pipeline. Accent on the two LLM-driven phases.图 03 —— 管线总览,重点标出两个由模型驱动的阶段。
  • Four-phase pipeline (identity → PRs → reviews → Haiku diff read → Sonnet profile)
  • PR-centric attribution model handling multi-author commit chains and squash merges
  • Diff cache in Postgres — Phase 2 re-runs never re-fetch from the GitHub API
  • 30K-character diff truncation to control Phase 2 token cost
  • groupBy aggregation queries — the employee list does not load the full contribution table
  • Public methodology page so the synthesis is auditable, not opaque
  • Web portal (Next.js 15) + analyzer CLI (TypeScript)
  • 四阶段管线(身份归并 → PR 采集 → review 旁路 → Haiku 读 diff → Sonnet 出画像)
  • 以 PR 为中心的归属模型,处理多作者 commit 链与 squash 合并
  • diff 缓存到 Postgres,第二阶段重跑不再回 GitHub API 取
  • diff 截断到 3 万字符以内,控住第二阶段的 token 成本
  • 员工列表用 groupBy 聚合查询,不加载完整贡献表
  • 对外可读的方法论页面,让合成结果可被反问,而不是黑箱
  • Web 门户(Next.js 15)+ 分析器命令行(TypeScript)
05OUTCOME结果

Leadership stopped routing through PR counts. They started reading the profile.

管理层不再绕着 PR 数走,他们开始读画像。

16Internal repos under analysis纳入分析的内部仓库
4Pipeline phases · Haiku + Sonnet split管线阶段 · Haiku + Sonnet 分工
60%Phase 2 cost reduction (cache + truncation)第二阶段成本降低
1Auditable methodology page可审计的方法论页面

The interesting outcome was not in the dashboard. It was upstream of the dashboard, in how leadership conversations actually moved. Discussions stopped routing through "how many PRs did X merge this quarter" and started routing through Phase 3 profile reads — the title-like descriptor, the centrality estimate, the role read, the short list of key deliverables. The shape of the contribution finally fit on the page.

The methodology page did the second piece of work. It became the trust contract: any contributor could read why the profile said what it said, trace the label back to the commits it summarized, and disagree on specifics rather than reject the whole thing. That is a different shape of conversation than the one a closed dashboard produces.

真正有意思的不是看板本身,而是看板上游——管理层讨论的走向发生了变化。话题不再绕着「这季度某某合了多少 PR」转,而是直接读第三阶段的画像:那个像称号的描述、核心度估计、角色判读、一份重点交付清单。贡献的形状终于装得进一页里。

方法论那一页完成了另一半工作。它成了信任契约:任何一位贡献者都可以读到自己被这样描述的原因,把标签回溯到它总结过的那些 commit,再去对具体内容提异议,而不是一上来就推翻整套结果。这是一种封闭看板永远造不出来的讨论形状。

06WHAT IT TAUGHT ME学到了什么

The pipeline was the easy half.

管线只是简单的那一半。

I started this build thinking the hard part was the pipeline — model selection, identity resolution, the diff cache, the truncation math, the cost shape. Those were the parts I could plan on a whiteboard and ship in evenings. The hard part turned out to be the methodology page: writing it precisely enough that the first contributor who disagreed with their own profile could read it, find the exact step they wanted to argue with, and argue with that step rather than with me. Every word on that page was load-bearing — softening it killed the audit trail, hardening it killed the willingness to read it. The lesson:

The hard part of measuring engineering isn't building the pipeline — it's writing the methodology page that survives the first person who disagrees with their own profile.

开干的时候,我以为难的是管线本身——选模型、归并身份、缓存 diff、截断长度、把成本算清楚。这些是能在白板上画完、晚上写完的部分。真正难的是方法论那一页:把它写得足够精确,让第一个对自己画像有意见的贡献者,读完它能准确指出他想反驳的是哪一步,然后跟那一步过招,而不是跟我过招。那页上每一个字都在受力——写软了,审计链断了;写硬了,没人愿意读完。这件事教我的是:

度量工程的难处不在搭管线,而在写一页能扛住第一个对自己画像不服的人的方法论。