Engineering output is mostly invisible to non-engineers. I designed a four-phase pipeline that reads commits with Claude, generates contributor profiles, and exposes the methodology so the judgment is auditable, not vibes.
工程产出对非工程人来说几乎是黑箱。我设计了一条四阶段管线:Claude 读 commit、生成贡献者画像、再把方法论本身公开成一页,让判断可被反问,而不是靠感觉。
I was inside a fast-growing Series A AI company where headcount had crossed the point at which any one person could remember who shipped what. Engineering moved across many repos, in many time zones, on a release cadence that nobody outside the team could narrate.
The consequence was predictable. Performance conversations and recognition decisions kept landing on the wrong axes — visible PR counts, who spoke up in meetings, who happened to sit next to whom. The actual shape of contribution was not on the table, because the table did not have it.
I wanted one artifact that turned the company's real GitHub activity into something a leadership team could read in five minutes — and that a contributor could read about themselves and recognize as fair.
那时我在一家快速扩张的 Series A AI 公司里面。人数早已过了「一个人还能记住谁做了什么」的拐点,工程团队散在很多仓库、很多时区,发布节奏只有工程内部讲得清。
结果也很可预测。绩效讨论和被认可的人,最后落在错的维度上——谁的 PR 数显眼、谁在会议上声音大、谁恰好坐在谁旁边。贡献真正的形状不在桌面上,因为这张桌面根本就没有它。
我想要一份能放在桌面上的东西:让管理层五分钟读完,公司真实的 GitHub 活动到底是什么样;也让任何一位被它描述的工程师读到自己时,能承认这是公平的。
GitHub will hand you 16 repos and tens of thousands of commits across a few dozen contributors. What it will not hand you is shape. Raw PR counts are a misleading proxy — one small bugfix can outweigh a month of careful refactor work. Reviews and issues matter, often more than authored code, and live in entirely different APIs. Reading commit diffs by hand does not scale past a single repo, let alone sixteen.
The hard problem was never "look at GitHub." The hard problem was turning diffs into substantively-described work in a way that survives audit — where every label on a person traces back to the line of code that produced it, and where the methodology itself can be argued with on its own page.
The system had to do four things at once: read code carefully, attribute it to the right person, aggregate it into a profile, and show its work. Skip any of them and the output is just another dashboard nobody trusts.
GitHub 会给你 16 个仓库、几万条 commit、几十位贡献者。它不会顺手给你的是「形状」。原始 PR 计数是个有误导性的代理:一个小 bugfix 可以盖过一个月认真的重构。Review 和 issue 同样重要,常常比写代码更重要,却散在完全不同的 API 里。靠人手读 diff 撑不过一个仓库,更别说十六个。
真正难的问题从来不是「看看 GitHub」。难的是把 diff 翻译成有实质描述的工作,并且让结论经得起追问——任何一个贴在人身上的标签,都要能回溯到产生它的那段代码;方法论本身也得能在它自己的一页上被反问。
系统要同时做四件事:认真读代码、归属到正确的人、聚合成画像、把自己的工作过程暴露出来。少做一件,产出就只是又一个没人信的看板。
Reconcile GitHub handles against the internal roster so every contribution lands on the right person. Engineers use multiple handles, bot accounts show up, and squash merges erase co-authors. Without this layer every downstream phase is noise dressed up as signal, so I treated it as the foundation, not an afterthought.
Pull merged PRs from each tracked repo via the GitHub API, walk every commit inside them, and attribute work PR-centric: one PR fans out to many commits and resolves to one author chain. This is the shape that actually matches how the team ships, which matters more than raw commit volume.
Supplement PRs with review counts and issue activity through the Search API, sidecar to the main pipeline. Some of the most central engineers on a team review heavily and author lightly — they are invisible to PR-count metrics and conspicuous the moment reviews enter the frame.
Run each commit diff through Claude Haiku to extract structured signal per commit: substantiveness, work type (feature / bug / chore / refactor), and auto-generated-code ratio. This is where the actual reading happens. Diffs are cached to Postgres and truncated at 30K characters so a re-run does not re-fetch from GitHub and does not blow up on a single oversized commit.
Aggregate Phase 2 per contributor and synthesize: a title-like descriptor, a centrality estimate, a role read, and a short list of key deliverables. Sonnet does not see raw diffs — it sees already-summarized commits. That ordering is the whole point.
Haiku is cheap and runs across many diffs; Sonnet is the synthesis pass over already-summarized work. The split materially controls cost — the cheap model does the wide read, the expensive one does the narrow write. Collapsing them into a single model either inflates spend by an order of magnitude or sacrifices the substance of the per-commit read.
把 GitHub 账号和公司花名册对齐,每条贡献都落到正确的人身上。工程师常有多个账号,机器人账号会冒出来,squash 合并还会抹掉共同作者。没有这一层,后面每一阶段都只是被包装得像信号的噪音,所以我把它当作地基,不是补丁。
从每个被跟踪的仓库拉已合并的 PR,走完里面的每一条 commit,再以 PR 为中心做归属:一个 PR 展开成很多 commit,再收敛到一条作者链。这才贴合团队真实的交付方式,比单纯比 commit 数量更有意义。
用 Search API 把 review 计数和 issue 活跃度补上,作为主管线的旁路。一个团队里最关键的人,往往评审得很多、自己写得少——只看 PR 数他们是隐形的,把 review 拉进画面里他们立刻显形。
每一条 commit 的 diff 进 Haiku,抽出结构化信号:实质程度、工作类型(feature / bug / chore / refactor)、自动生成代码比例。真正「读代码」的事情发生在这一步。diff 缓存到 Postgres,截断到 3 万字符以内,重跑不再回 GitHub 取,也不会因为某条超大 commit 把整条管线打挂。
把第二阶段的结果按贡献者聚合,再合成出画像:一个像称号的描述、一个核心度估计、一段角色判读、一份重点交付清单。Sonnet 看不到原始 diff,它看到的是已经被总结过的 commit。这个顺序就是关键。
Haiku 便宜,适合大面积扫;Sonnet 是已经被压缩过的工作之上的合成层。这种分工对成本是实质性的——便宜的模型去做广读,贵的模型负责窄写。把它们合成一个模型,要么花费翻一个数量级,要么牺牲掉每条 commit 被认真读的那一层。
The interesting outcome was not in the dashboard. It was upstream of the dashboard, in how leadership conversations actually moved. Discussions stopped routing through "how many PRs did X merge this quarter" and started routing through Phase 3 profile reads — the title-like descriptor, the centrality estimate, the role read, the short list of key deliverables. The shape of the contribution finally fit on the page.
The methodology page did the second piece of work. It became the trust contract: any contributor could read why the profile said what it said, trace the label back to the commits it summarized, and disagree on specifics rather than reject the whole thing. That is a different shape of conversation than the one a closed dashboard produces.
真正有意思的不是看板本身,而是看板上游——管理层讨论的走向发生了变化。话题不再绕着「这季度某某合了多少 PR」转,而是直接读第三阶段的画像:那个像称号的描述、核心度估计、角色判读、一份重点交付清单。贡献的形状终于装得进一页里。
方法论那一页完成了另一半工作。它成了信任契约:任何一位贡献者都可以读到自己被这样描述的原因,把标签回溯到它总结过的那些 commit,再去对具体内容提异议,而不是一上来就推翻整套结果。这是一种封闭看板永远造不出来的讨论形状。
I started this build thinking the hard part was the pipeline — model selection, identity resolution, the diff cache, the truncation math, the cost shape. Those were the parts I could plan on a whiteboard and ship in evenings. The hard part turned out to be the methodology page: writing it precisely enough that the first contributor who disagreed with their own profile could read it, find the exact step they wanted to argue with, and argue with that step rather than with me. Every word on that page was load-bearing — softening it killed the audit trail, hardening it killed the willingness to read it. The lesson:
The hard part of measuring engineering isn't building the pipeline — it's writing the methodology page that survives the first person who disagrees with their own profile.
开干的时候,我以为难的是管线本身——选模型、归并身份、缓存 diff、截断长度、把成本算清楚。这些是能在白板上画完、晚上写完的部分。真正难的是方法论那一页:把它写得足够精确,让第一个对自己画像有意见的贡献者,读完它能准确指出他想反驳的是哪一步,然后跟那一步过招,而不是跟我过招。那页上每一个字都在受力——写软了,审计链断了;写硬了,没人愿意读完。这件事教我的是:
度量工程的难处不在搭管线,而在写一页能扛住第一个对自己画像不服的人的方法论。