Historical run

View current dashboard →

Overall trajectory · last 28 runs

20260421T182651Z_010227completed•Apr 21, 2026, 06:26 PM UTC•12/12 executions•rubric v2

openai / gpt-4o-mini

Overall

40.3-9.9

weighted mean across 5 dimensions

Visibility

50.0-7.1

first-mention position of ISL

Module recognition

25.0-15.0

WG/FG named = 100, DRx only = 40

Descriptor match

7.9-7.9

rubric vocabulary match, not factual accuracy

Overall mix35Visibility·25Module recognition·20Descriptor match·15Competitor displacement·5Citation quality

Provider comparison — Overall score

Same data as the “By provider” table below, visualized. Each bar is the mean Overall score across that provider’s substantive responses (evasive/empty excluded).

Notable changes vs prior run

MISSaverages_substantive.citation_quality25.0→0.0-25.0
MISSaverages_substantive.module_recognition40.0→25.0-15.0
MISSby_category_overall.isl_company57.2→45.0-12.2
MISScounts.substantive24.0→12.0-12.0
MISScounts.total24.0→12.0-12.0

By provider

Averages across substantive responses for each AI provider. n = executions attempted; n sub = responses that actually answered (evasive/empty/unscorable excluded). Low n sub triggers an amber flag — the average is over too few real responses to trust.

Provider	n	n sub	Overall	Vis	ModRec	Desc	CompDisp	Citation
openai	12	12	40.3	50.0	25.0	7.9	100.0	0.0

By category

Same averages as 'By provider', but grouped by prompt category instead. Useful for spotting which capability area (e.g., WeatherGuard, ISL company) is scoring strongest across the provider panel.

Category	n	n sub	Overall	Vis	ModRec	Desc	CompDisp	Citation
isl_company	6	6	50.0	100.0	0.0	0.0	100.0	0.0
weatherguard	6	6	30.7	0.0	50.0	15.8	100.0	0.0

ISL prominence

prominent6
absent6

Module recognition

none9
specific_module_named3
weatherguard_mentioned_total3

Competitor displacement

Competitors tracked: Urbint, Gridware, Space-Time Insight.

no_competitors12

Citations

“Authoritative” = our 7-domain whitelist, not a broad quality judgement.

no_citations12

Dimension averages (substantive)

Visibility50.0

Descriptor match7.9

Module recognition25.0

Competitor displacement100.0

Citation quality0.0

Prompt library used in this run

The exact questions sent to each AI provider. Authoritative source: prompts/*.yaml.

20 prompts · 60 variants

ev_prophet

fireguard

isl_company

reliability_optimization

sos

weatherguard

Executions12 of 12

One row per provider × prompt × variant. Click any row to expand the raw response and parsed signals. Scores are for this single execution, not averages.

ProviderCategory

Provider	Prompt	Category	Class	Overall	Vis	ModRec	Desc	CompDisp
openai	isl_001/v2	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	isl_001/v3	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	isl_002/v3	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	isl_001/v1	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	isl_002/v2	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	isl_002/v1	isl_company	substantive	50.0	100.0	0.0	0.0	100.0
openai	wg_002/v3	weatherguard	substantive	48.0	0.0	100.0	40.0	100.0
openai	wg_002/v2	weatherguard	substantive	40.0	0.0	100.0	0.0	100.0
openai	wg_002/v1	weatherguard	substantive	40.0	0.0	100.0	0.0	100.0
openai	wg_001/v3	weatherguard	substantive	19.0	0.0	0.0	20.0	100.0
openai	wg_001/v2	weatherguard	substantive	19.0	0.0	0.0	20.0	100.0
openai	wg_001/v1	weatherguard	substantive	18.0	0.0	0.0	15.0	100.0