How to Verify a Trader's Track Record Yourself — A Step-by-Step Guide
A practical, end-to-end procedure for confirming that a trader's stated performance matches an independent primary record. Covers exchange APIs, NAV reconciliation, TWR re-derivation, and chain integrity checks.
- Verifying a track record means re-deriving the stated performance from primary records, not trusting a screenshot.
- A reviewer needs four things: read access to the primary data, the methodology used, the snapshot history, and a deterministic computation tool.
- The procedure has six steps: source check, methodology check, completeness check, recomputation, chain integrity, and timeline anchor.
- On NakedPnL, every step is reproducible in a browser using the Web Crypto API and the chain bundle exposed at /verify/chain/[handle].
A claimed track record is a number on a slide deck. A verified track record is a number a third party can re-derive from the primary records that produced it, in full, without trusting the trader. Most allocators, journalists, and curious skeptics never make the leap from one to the other because the procedure looks intimidating from the outside. It is not. The work breaks into six discrete steps, each of which can be done by a single person with a laptop and a few hours.
This guide walks the procedure end to end, using NakedPnL's public chain as the running example. The same pattern applies to any verification surface that follows the four properties described in the verified track record glossary entry — independent primary data, deterministic computation, append-only history, and published methodology.
What you are actually proving
Before starting, it is worth being clear about what verification proves and what it does not. Verification confirms that a stated figure matches the primary records that produced it. It does not prove the trader will continue to perform the same way in the future. It does not prove the venue itself is honest about the underlying NAV. It does not prove the trader has connected every account they control. What it does prove is much narrower: for the accounts that are connected, over the period that is published, the figures shown match what the venue reported and the methodology says they should be.
That narrow proof is still extremely valuable. Most cases of fraudulent or misleading track records fail at exactly this step — the headline figure cannot be reconciled with the primary record. Catching that single failure mode eliminates the bulk of the bad cases an allocator or a counterparty actually encounters in practice.
Step 1 — Confirm the primary data source is independent
The first question is the most important: where do the figures come from? A self-reported P&L spreadsheet is a claim, not a record. A screenshot of a venue dashboard is a claim about a record. The actual record is what the venue's API returns when queried with credentials for the trader's account.
On NakedPnL, the primary data source is the read-only API key the trader registered. The daily snapshot cron at 23:55 UTC pulls account state — balances, equity, positions — directly from Binance, Bybit, OKX, IBKR Flex Web Service, or the prediction-market client for Kalshi and Polymarket. The raw response is canonicalised (whitespace removed, keys sorted) and SHA-256 hashed. The hash, plus the canonical response itself, is what the chain stores. A verifier who suspects the venue's response was substituted at some point can fetch a fresh snapshot from the venue with the same read-only credentials and compare.
If the trader cannot or will not produce a fresh snapshot from the venue API for cross-checking, that is itself diagnostic. A genuine read-only credential remains valid; refusing to reproduce a current snapshot indicates either the credential has been revoked (in which case the chain has not been updating) or the original data was never sourced from the venue at all.
Step 2 — Confirm the methodology is documented and appropriate
The headline performance figure is a function of the underlying NAV series and the methodology applied to it. Different methodologies give different numbers, and not all of them are equally honest for the question being asked. The two main alternatives for fund-level performance are time-weighted return (TWR) and money-weighted return (MWR, equivalent to internal rate of return).
TWR removes the impact of cash-flow timing — a deposit of new capital does not flatter the manager's reported skill. MWR incorporates cash-flow timing — it answers the question 'what was my actual outcome as the investor in this fund'. For evaluating manager skill on a like-for-like basis, GIPS requires TWR. For evaluating a single investor's wallet result, MWR is appropriate. NakedPnL publishes TWR because it is the metric that survives manager-to-manager comparison.
A reviewer should look for explicit answers to four methodology questions: what metric is reported, what cadence (daily, monthly, quarterly), what cash-flow handling, and what numerical precision. NakedPnL's answers are: TWR, daily snapshots at 23:55 UTC, sub-period termination at every external flow per GIPS, and Decimal.js with 28 decimal digits of precision. The methodology guide on TWR explained walks the calculation step by step. The guide on decimal precision explains why Decimal.js is necessary for repeatable results across thousands of compounded steps.
Step 3 — Check completeness over the claimed period
A track record covering 18 months that is missing 27 days is not the same as one that is whole. Gaps in the snapshot series are common — exchange API outages, maintenance windows, network issues — and an honest registry records them explicitly rather than backfilling with interpolated values. The completeness check is therefore not 'were there gaps?' (there usually were) but 'are the gaps disclosed and bounded?'.
On NakedPnL, every NavSnapshot row has a date and a status. Missing days appear as gap markers in the chain rather than as fabricated values. A reviewer can pull the chain bundle and tally the disclosed days against the claimed period: a 540-day period should have at most 540 snapshot rows, and any deficit corresponds to disclosed gaps. If the published TWR claims to span 540 days but the chain only contains 480 rows with no gap markers for the missing 60 days, the figure has a problem the reviewer needs to investigate.
| Check | What to look for | Red flag |
|---|---|---|
| Period bounds | Earliest and latest snapshot dates | Earliest date is far later than the trader claimed start |
| Snapshot count | Number of NavSnapshot rows | Materially fewer than period length × cadence |
| Gap markers | Explicit gap entries in the chain | Suspiciously smooth series with no gaps in a 24/7 market |
| Account coverage | Number of connected venues | Single connected account when the trader claims multi-venue activity |
| Verification depth | Bronze / Silver / Gold tier indicator | Lower tier than the trader's marketing implies |
Step 4 — Recompute the figure from primary data
This is the step that converts verification from a procedural review into a re-runnable computation. A reviewer takes the canonical NAV series from the chain bundle, runs it through the documented TWR algorithm, and compares the output to the published figure. If they match to the precision the original engine used, the figure is reproduced. If they do not, the reviewer has a specific discrepancy to investigate.
The reference implementation is documented at /docs/verification with snippets in both Python and JavaScript. Python is convenient for off-line review; JavaScript runs in the browser via the Web Crypto API for instant verification on the /verify/chain/[handle] page. The full algorithm fits in roughly 30 lines and uses arbitrary-precision decimal arithmetic to avoid floating-point drift across thousands of compounded daily steps.
from decimal import Decimal, getcontext
import hashlib, json
getcontext().prec = 28
def canonical(obj):
"""Sort keys, no whitespace — same canonical form NakedPnL uses."""
return json.dumps(obj, sort_keys=True, separators=(',', ':'))
def sha256_hex(s: str) -> str:
return hashlib.sha256(s.encode('utf-8')).hexdigest()
def recompute_chain(snapshots):
"""Re-derive content + chain hashes from raw snapshots.
snapshots: list of {date, raw_response, flow}.
Returns list of {date, content_hash, chain_hash}.
Compare to the chain bundle to confirm integrity.
"""
out = []
prev_chain = 'genesis'
for s in snapshots:
content_hash = sha256_hex(canonical(s['raw_response']))
chain_hash = sha256_hex(prev_chain + content_hash)
out.append({'date': s['date'],
'content_hash': content_hash,
'chain_hash': chain_hash})
prev_chain = chain_hash
return out
def twr(navs_with_flows):
"""Geometric chain-link of sub-period returns per GIPS.
navs_with_flows: list of (nav_before_flow, flow_on_day),
earliest first. Returns total period TWR as Decimal.
"""
growth = Decimal(1)
base = navs_with_flows[0][0] + navs_with_flows[0][1]
for prev, curr in zip(navs_with_flows, navs_with_flows[1:]):
end = curr[0] # NAV before the day's flow
if base <= 0:
raise ValueError('non-positive sub-period base')
sub_return = (end - base) / base
growth *= (Decimal(1) + sub_return)
base = curr[0] + curr[1] # post-flow base for next sub-period
return growth - Decimal(1)The same procedure runs in the browser using crypto.subtle.digest('SHA-256', ...) and BigInt arithmetic for the TWR computation. The /verify/chain/[handle] page does this end-to-end on the user's machine; no trust in NakedPnL's servers is required for the recomputation step.
Step 5 — Check chain integrity
Recomputing today's figure is not enough on its own. A reviewer also needs to confirm that the historical chain has not been edited since publication. This is where the SHA-256 chain pays off: each chain header is a hash of the previous header concatenated with the current row's content hash. If any historical row is changed, every chain header from that row forward changes, and the discrepancy is immediately visible.
The integrity check is mechanical. Walk the chain from the genesis row to the latest entry. For each row, recompute the chain header from the previous chain header plus the current content hash. Compare to the stored chain header. Any divergence localises to a specific row — that row, or one before it, has been edited. The methodology guide on SHA-256 hash chains documents the algorithm in more depth.
Step 6 — Confirm the timeline against an external anchor
Steps 4 and 5 confirm internal consistency: the chain is well-formed and the published figure matches the data. They do not, on their own, prove that the data existed when it claims to have existed. A motivated registry operator could in principle generate an entire chain after the fact and label it with backdated dates. The defence against this is an external anchor — a record of the chain's state at a point in time that the registry cannot edit retroactively.
NakedPnL anchors the daily Merkle root of all chain heads to Bitcoin via OpenTimestamps. Each day's Merkle root is committed as an OpenTimestamps proof; once the proof is upgraded with a Bitcoin block height, the root is bound to a specific block in Bitcoin's history. A reviewer who fetches the OTS proof for any past date can confirm the Merkle root in the proof matches the chain heads that existed on that date — and confirm the proof is bound to a Bitcoin block from the same era. Editing the chain after the fact would either break the recomputed Merkle root or invalidate the OTS proof against Bitcoin.
The verify endpoint at /api/verify/[date] returns the OTS proof, the Merkle root, and the underlying chain heads for any date in the registry's history. The verification methodology guide on OpenTimestamps Bitcoin anchoring documents the full anchoring procedure.
Putting the six steps together
A complete verification of a NakedPnL track record is six discrete checks: (1) source independence — the data came from the venue, not the trader's spreadsheet; (2) methodology — TWR per GIPS, with documented cash-flow handling; (3) completeness — the chain covers the claimed period with disclosed gaps; (4) recomputation — the published TWR matches what the algorithm produces from the primary NAV series; (5) chain integrity — every chain header is consistent with the prior header and the current content hash; (6) timeline anchor — the OTS proof binds the chain head to a Bitcoin block from the claimed publication date.
Pass all six and the headline figure is reproducible from primary records. Fail any one and the reviewer has a specific question to take back to the trader. This is the workflow allocators run for serious due diligence; the only difference between this procedure and a traditional GIPS verification engagement is that the cryptographic version is automatable and fits in a browser tab.
What to do when verification fails
Failure at each step has a different meaning. Source-independence failure (Step 1) is the most serious — the figure is not derived from the venue's record at all. Methodology failure (Step 2) usually means the trader is reporting a different metric than is being claimed; demand TWR if the marketing implies manager skill. Completeness failure (Step 3) often points to undisclosed gaps that flatter the headline; ask for a gap-by-gap accounting. Recomputation failure (Step 4) is precise — the algorithm and the data disagree; one of them is wrong. Chain integrity failure (Step 5) localises to a specific row that has been edited. Timeline-anchor failure (Step 6) suggests the chain was not anchored when it claims to have been. Each failure has a specific corrective action; vague claims about 'verification issues' are not enough.
What this procedure does not solve
Verification is not a forecast and not a survivorship-bias correction. A trader who connects only their winning accounts has a verified record of the winners and silence about the losers. The verification cannot tell you whether the trader has five other accounts you are not seeing. It also cannot tell you whether the strategy that produced the historical numbers will continue to work. The methodology guide on survivorship bias covers the structural limits.
Within those limits, verification still does what it is supposed to: it eliminates the most common single failure mode of bad track records, which is that the published number does not match the primary record. Catching that one failure removes most of the cases that allocators encounter in practice. The rest is judgment.