Ask Dashboard - Real-time Q&A Monitoring

Component	% Null/Zero	Count	Total
Semantic	-	-	-
Decay	-	-	-
BM25	-	-	-
Cross Encoder	-	-	-

Subclass	#Q	Half-Life (days)	Age → Floor (days)	Floor-bound Rate	Median Decay	Age@95% Floor	Spread (P75–P25)
BRT	-	hrs	—	—	—	—	—
RWR	-		—	—	—	—	—
TAH	-		N/A	Decay disabled (DecayFactor = 1.0)
CRBN	-		—	—	—	—	—
UNKNW	-		—	—	—	—	—
UNCLASSIFIED	-		—	—	—	—	—

Overview

This dashboard monitors questions asked through the Ask module in real-time. It displays questions, answers, source articles, named entities, and performance metrics.

Header Controls

Auto-refresh: Toggle to enable/disable automatic data refresh
Interval: Set how often the dashboard refreshes (1-15 minutes)
Last updated: Shows when data was last fetched

Summary Statistics

Total Questions: Number of questions in the current view
Backstop Queries: Questions that required fallback processing
Unique Cited Domains: Count of distinct domains actually cited in answers. Hover over this box to see a breakdown showing each domain, how many times it was cited, and its average Domain Reliability (DR) score.
Last updated: Shows when the dashboard last checked for new questions

Date Range Selection:

Preset Buttons: Quick selection options - Today, Yesterday, Last 7/15/30/60/90 days (default: Last 15)
Custom Date Range: Enter specific start and end dates, then click "Apply" to filter. Click "Clear" to return to preset mode.
Mutual Exclusivity: Preset buttons and custom dates are mutually exclusive - selecting one disables the other.
Auto-refresh Indicator: Shows whether auto-refresh is enabled (green dot = ON, gray dot = OFF).

Auto-refresh Behavior:

Auto-refresh ON: Today, Last 7, Last 15, Last 30, Last 60, Last 90 days (live monitoring)
Auto-refresh OFF: Yesterday and Custom date ranges (historical data, no updates expected)

Performance metrics are split into two rows:

Non-Backstop row (green): Averages for queries answered without fallback
Backstop row (red): Averages for queries that required fallback processing

Each row shows: Total, Search, Answer, Suggest, Other, Words, W/Sec, Cites, and Avg Score (average article score of cited sources).

Note: Click section headers to expand/collapse the Category/Classification Breakdown Tables and Alt6 Ranking Analysis sections.

Category Breakdown table:

Name: The question category (e.g., Politics, Sports, Technology)
#: Number of questions in that category
Avg Cited DR: Average domain reliability of cited sources, averaged across all questions in the category
Avg Cited Score: Average production score of cited sources, averaged across all questions in the category
Avg Top 8 DR: Average domain reliability of top 8 sources by rank, averaged across all questions in the category
Avg Top 8 Score: Average production score of top 8 sources by rank, averaged across all questions in the category
Avg Alt1 Score: Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
Avg Alt1 Top 8 DR: Average domain reliability of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
Avg Alt2 Score: Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
Avg Alt2 Top 8 DR: Average domain reliability of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)

Click the sort arrows on any column header to sort the table.

Classification Breakdown table:

Name: The question classification (e.g., verified, unverified, etc.)
#: Number of questions with that classification
Avg Cited DR: Average domain reliability of cited sources, averaged across all questions with that classification
Avg Cited Score: Average production score of cited sources, averaged across all questions with that classification
Avg Top 8 DR: Average domain reliability of top 8 sources by rank, averaged across all questions with that classification
Avg Top 8 Score: Average production score of top 8 sources by rank, averaged across all questions with that classification
Avg Alt1 Score: Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
Avg Alt1 Top 8 DR: Average domain reliability of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
Avg Alt2 Score: Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
Avg Alt2 Top 8 DR: Average domain reliability of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)

Click the sort arrows on any column header to sort the table. Default sort is by count descending.

Alt6 Ranking Analysis (Temporal Decay vs Prod):

Compares how Alt Score 6 (temporal decay scoring) ranks sources differently from Production. Alt6 applies subclass-specific half-life decay to relevance scores, which can significantly change rankings for time-sensitive content.

Two-Tier Ranking System: Alt6 ranking uses the same two-tier approach as Production ranking:

Tier 1 (Top positions): Non-excluded sources ranked by Alt6 score (highest = rank 1)
Tier 2 (Bottom positions): Excluded sources ranked by Alt6 score among themselves
An excluded source with a high Alt6 score will always rank below a non-excluded source with a lower Alt6 score
Exclusion criteria: backend is_excluded flag, classification (adult/conspiracy/gambling), and user-configured thresholds (DR, Score, Semantic, etc.)

Ranking Shift Metrics:

Avg absolute change: Average number of slots sources moved (regardless of direction)
Avg improvement/worsening: Average slot change for sources that improved/worsened
% improved/worsened: Percentage of sources that moved up/down in ranking
Bottom 1/3 -> Top 1/3: Sources that jumped from bottom third to top third
Top X dropped D+ slots: Top-ranked sources that dropped significantly
Bottom 50% -> Top X: Sources from bottom half that made it to top X (only for questions with 50+ sources)
Jumped/Dropped K+ slots: Sources with significant rank changes

Configure thresholds (Top X, K, D) in Ranking Metrics Settings. Summary shows percentages; per-question shows raw counts.

Ranking Shift Metrics (per question):

Expand this section on any question card to see detailed ranking shift metrics for that specific question, comparing Alt6 vs Production ranking.

Master Article Decay Score Settings:

Configure alternative decay score formulas. The decay formula is: Decay Score = Base Score × e^(-(λ × (t^p)))

Prod Decay Score: Read-only display of production values (λ=0.03, p=1.25)
Alt Decay Score 1 & 2: Configure custom decay formulas

Variables (nested dependencies):

Base Score: Semantic/embedding score (always independently selectable)
e (Euler's number): Constant 2.71828 - enables exponential decay
^(-(exponent)): Negative exponential (requires e to be checked)
λ (Decay Rate): Controls decay speed (requires ^(-(exponent)))
t (Days Since Publication): Time variable (requires ^(-(exponent)))
p (Power): Exponent for time (requires t)

Check the Save checkbox to persist your configuration across page refreshes.

Alt Decay Columns (AD1 & AD2):

When Alt Decay Score 1 or 2 is enabled, two new columns appear in the Sources table: AD1 and AD2. These columns display:

Computed Value: The decay score calculated using your configured formula (e.g., 0.847)
Formula Breakdown: Shows the actual values used in the computation (e.g., 0.923 × e^(-0.05 × 14.5^1.5))
Column Header Tooltip: Hover over AD1/AD2 header to see the configured formula

Key Details:

Alt Decay uses the raw semantic score (not percentalized)
Days since publication (t) is calculated from source's published_at vs question's asked_at
Columns update immediately when you change configuration values while toggle is ON
When toggle is OFF, columns display "-" for all sources

Show PCT Mode:

When "Show PCT" is enabled, Alt Decay columns display percentalized values (0-1 range)
Percentalization uses min-max normalization across all sources within the question
Column headers show "(%) " suffix when PCT mode is ON
Highest Alt Decay value becomes 1.0, lowest becomes 0.0
In PCT mode, formula breakdown is hidden (only the percentile value is shown)

Filters

User: Filter questions by specific user
Classification: Filter by question type (Investigative, Temporally-Aware, etc.)
Category: Filter by content category (politics, business, sports, etc.)
Backstop: Filter by backstop status (All, Non-Backstop, or Backstop only)

All metrics and statistics update dynamically based on the current filter selection.

Question Cards

Each question is displayed as a card with expandable sections:

Answer: The generated response with citation numbers (tabs for Alt Answers in header)
Sources: Source articles used (click headers to expand)
Suggestions: Follow-up questions
Named Entities: People, organizations, locations identified
Performance Metrics: Timing data for the query
Ranking Shift Metrics: How Alt Score rankings compare to Prod rankings

Answer Section (Three-Column Layout):

The Answer section displays three columns side-by-side for easy comparison:

Answer (Column 1): The production answer from the original query
Alt Answer 1 (Column 2): Generate an alternative answer using top 8 sources ranked by Alt Score 1
Alt Answer 2 (Column 3): Generate an alternative answer using top 8 sources ranked by Alt Score 2

How to use the Answer Section:

Expand/Collapse: Click anywhere on the Answer header to toggle the section
Side-by-Side Comparison: All three answers are visible simultaneously when expanded
Vertical Dividers: Columns are separated by vertical lines for clarity
Equal Width: Each column takes 33% of the width

How Alt Answers work:

Alt Score 1 or 2 must be configured in Master Article Score Settings to enable generation
Click "Generate Alt Answer" to call the API with the top 8 sources by Alt Rank
Two-tier ranking ensures the top 8 sources are always non-excluded (excluded sources are ranked at the bottom)
The answer is generated using the same question but with different source articles
Generated answers are cached for the session (not persisted across page refreshes)

Question Badges

CATEGORY Content category (politics, sports, etc.)

CLASSIFICATION Question type classification

BACKSTOP Question required fallback processing

Avg Prod Cited Score: 0.XXX Average production score of cited sources only

Avg Prod Top 8 Score: 0.XXX Average production score of top 8 sources by rank (or all sources if fewer than 8)

Avg Alt1 Top 8 Score: 0.XXX Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt1 not configured)

Avg Alt2 Top 8 Score: 0.XXX Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt2 not configured)

Sources Section Badges

These badges appear in the Sources section for each question:

Avg Prod Cited Domain Reliability: 0.XXX Average domain reliability of cited sources only

Avg Prod Top 8 Domain Reliability: 0.XXX Average domain reliability of top 8 sources by rank

Avg Prod Cited Readability: 0.XXX Average readability of cited sources only

Avg Prod Top 8 Readability: 0.XXX Average readability of top 8 sources by rank

Avg Alt1 Top 8 Domain Reliability: 0.XXX Average DR of top 8 sources by Alt1 rank (n/a if Alt1 not configured)

Avg Alt1 Top 8 Readability: 0.XXX Average readability of top 8 sources by Alt1 rank (n/a if Alt1 not configured)

Avg Alt2 Top 8 Domain Reliability: 0.XXX Average DR of top 8 sources by Alt2 rank (n/a if Alt2 not configured)

Avg Alt2 Top 8 Readability: 0.XXX Average readability of top 8 sources by Alt2 rank (n/a if Alt2 not configured)

Sources Table

The sources table shows articles considered for the answer:

Age	Time since article was published
Rnk	Article rank (lower = higher relevance)
Score	Combined relevance score (0-1)
Sem	Semantic similarity score
BM25	Keyword matching score
Cross	Cross-encoder relevance score
DR	Domain Reliability score (0-100)
Depth	Content Depth Score (0-5)
Pos	Positive Sentiment
Neg	Negative Sentiment
Sent	Sentiment summary (sum of the positive and negative sentiment scores)
Read	Readability score
Decay	Time adjusted semantic score
Class	Article classification
Excl	Whether the content is ignored

Source Row Colors

Pink background: Top-8 article NOT cited in the answer

White background: Article cited in the answer OR ranked below 8

Score Colors

0.600 High score (>= 0.5)

0.350 Medium score (0.2 - 0.5)

0.100 Low score (< 0.2)

Keyboard Shortcuts

Escape: Close this help dialog

LLM-as-Judge Evaluation

This feature uses OpenAI's GPT-4o-mini to evaluate and compare answer quality. Located between the Answer and Sources sections in each question card.

How to Use:

Expand the "LLM-as-Judge Evaluation" section
Click "Evaluate Answers" button to trigger evaluation
Evaluation includes the Production answer, plus Alt Answer 1 and/or Alt Answer 2 if generated

Evaluation Criteria (1-4 scale):

Accuracy	Is the answer factually correct? Evaluates correctness of claims, dates, facts, and statements
Completeness & Relevance	How thoroughly the answer addresses all aspects of the question
Contextual Understanding	How well the answer demonstrates understanding of the question's nuances
Clarity	How clear and easy to understand the answer is
Conciseness	How efficiently the answer conveys information without verbosity
Tone & Language Flow	How natural, professional, and appropriate the language is
Formatting	How well-structured and visually organized the answer is

Score Scale:

4 (VeryGood) Excellent performance on this criterion

3 (Good) Adequate performance with minor issues

2 (Bad) Noticeable problems affecting quality

1 (VeryBad) Significant failures on this criterion

Per-Criterion Reasoning:

Each criterion includes a detailed reasoning (2-3 sentences) explaining why that specific score was assigned. The reasoning appears below each criterion score in italic text.

General Commentary:

Each answer receives a holistic assessment with specific examples and observations. This appears at the bottom of each score card in a highlighted box.

Comparison Analysis:

When multiple answers are available (Prod + Alt1 and/or Alt2), the LLM provides a comparative analysis explaining which answer is best and why, considering both content quality and source relevance.

Source Context:

The evaluation considers the top 8 ranked sources (title, description, and fragment) for each answer type, using Production ranking for the Prod answer and Alt Score ranking for Alt answers.

Formulas

Article Score

The article score is calculated differently based on whether the content is static (e.g., Wikipedia) or has a publication date.

For Static Content:

score = 0.50 × score_semantic_pct + 0.15 × score_bm_25_pct + 0.35 × score_cross_pct

Where:

• score_semantic_pct = semantic similarity score (percentile-normalized)

• score_bm_25_pct = BM25 keyword matching score (percentile-normalized)

• score_cross_pct = cross-encoder score (percentile-normalized)

Weights: Semantic: 50% • BM25: 15% • Cross-encoder: 35%

For Non-Static Content (with publication date):

score = 0.15 × score_semantic_pct + 0.35 × score_decay_pct + 0.15 × score_bm_25_pct + 0.35 × score_cross_pct

Where:

• score_semantic_pct = semantic similarity score (percentile-normalized)

• score_decay_pct = time-decayed score (percentile-normalized)

• score_bm_25_pct = BM25 keyword matching score (percentile-normalized)

• score_cross_pct = cross-encoder score (percentile-normalized)

Weights: Semantic: 15% • Time decay: 35% • BM25: 15% • Cross-encoder: 35%

Decay Score

Time decay adjusts the semantic score based on article age, giving more weight to recent content.

score_decay = base_score × e^{(-λ × t^p)}

Where:

• score_decay = the adjusted score after applying time decay

• base_score = the original semantic score before decay

• e = Euler's number (≈ 2.71828)

• λ = decay rate = 0.03 day^-1

• t = time since publication in days

• p = power = 1.25

Alt6 Scoring (Temporal-Aware Relevance)

Alt6 is an experimental scoring formula designed for temporally-aware questions. It combines relevance with a temporal decay factor based on the question's temporal subclass.

Formula:

Alt6 = RelPct × DecayFactor

Where:

• RelPct = Percentile rank of the source's Relevance across all sources in the session pool

• DecayFactor = Time decay based on temporal subclass (see below)

Relevance Calculation:

Relevance = w_cross × p_cross + (1 - w_cross) × p_bm25

Where:

• w_cross = 0.85 (default, configurable in Alt6 settings)

• p_cross = score_cross_pct (cross-encoder percentile)

• p_bm25 = score_bm_25_pct (BM25 percentile)

Temporal Subclasses & Decay Parameters:

BRT	Breaking/Recent Topics - Half-life: 1 day, Floor: 0.05
RWR	Recent Window Required - Half-life: 7 days, Floor: 0.10
TAH	Temporally-Aware Historical - DecayFactor = 1.0 (no decay)
CRBN	Context-Rich Background News - Half-life: 30 days, Floor: 0.20
UNKNW	Unknown - Half-life: 14 days, Floor: 0.15

Decay formula: DecayFactor = max(floor, 0.5^(age_days / half_life))

Unknown Publish Date Handling:

Sources with missing or invalid publish dates (age >10 years detected as anomaly) use subclass-specific fallback decay factors:

BRT/RWR	Floor value (penalize unknown for time-sensitive questions)
TAH	1.0 (date irrelevant for timeless content)
CRBN	0.70 (moderate - reference content more forgiving)
UNKNW	Floor value (conservative default)

Sources using fallback show "(no date)" indicator in purple.

UI Indicators:

Rel(X.XXX)→RelPct(X.XX) - Shows raw Relevance score and its percentile rank in the session pool
q-rank: N/M - Shows the source's rank within this question (N of M sources, sorted by Relevance descending)
ⓘ icons - Hover for detailed breakdown tooltips showing raw values and calculations
Session pool - Displayed in the Alt6 config panel, shows total sources used for percentile calculation
"inputs missing" - Shown when a source lacks cross_pct or bm25_pct values needed for Alt6
Decay badge - Shows temporal subclass and decay parameters near the question/answer

Score Deltas:

ScoreΔ = Alt6 - ProdScore (positive = Alt6 scores higher)
RankΔ = ProdRank - Alt6Rank (positive = Alt6 ranks the source higher)

Relevance Health Diagnostic

A per-question diagnostic that summarizes the absolute strength of the retrieved candidate set using RAW Relevance (not RelPct), to answer: "Do we have enough strong sources, or should we retrieve more / broaden search?"

Status Levels:

🟢 Strong max ≥ 0.75 AND count(Relevance ≥ 0.70) ≥ 3

🟡 Thin max ≥ 0.65 AND count(Relevance ≥ 0.60) ≥ 2

🔴 Weak — expand sources Otherwise (candidate set may need broader search)

Detail Panel (click to expand):

Counts: N total sources, N valid (with pCross and pBM25)
Distribution: max, p75, median of Relevance
Strength: Count of sources ≥0.70, ≥0.60, ≥0.50

Note: This diagnostic uses raw Relevance (time-independent signal: meaning + keywords) to detect when the candidate set is weak even if ranks or percentiles look fine. It does not affect ranking or selection.

Decay Lab Results

Diagnostics showing how temporal decay is behaving across each subclass. Helps tune Half-life and Floor parameters safely.

Metrics (per subclass):

Floor-bound rate: Percentage of sources at the decay floor ("as old as we'll treat them")
Median decay: Typical DecayFactor value—shows how strongly time affects results
Age@95% floor: Age by which 95% of floor-bound sources have reached the floor
Spread (P75–P25): Range of DecayFactor values—indicates decay curve diversity

Status Indicators:

🟢 Normal Metric is within expected range for this subclass

🟡 Warning Metric is slightly outside expected range

🔴 Concerning Metric is far outside expected range—consider adjusting parameters

Tuning Tips:

High floor-bound rate: Try increasing half-life or lowering floor
Low floor-bound rate: Try decreasing half-life or raising floor
Median decay too low: Try increasing half-life or raising floor
Median decay too high: Try decreasing half-life or lowering floor

Note: TAH (Temporally-Aware Historical) has decay disabled (DecayFactor = 1.0 always). Input fields show "Typical" ranges below them, with warnings for values outside recommended bounds.

Technical Documentation (for Claude Code)

Last updated: 2026-01-31

Dashboard Startup Commands

There are two dashboards in this project:

Ask Dashboard (Q&A monitoring) - Port 5002
.venv/bin/python3 dashboards/run_ask_dashboard.py
URL: http://localhost:5002
Article Dashboard (streaming article monitor) - Port 8083
.venv/bin/python3 dashboards/backend/flask_dashboard.py
URL: http://localhost:8083

Important: Dashboard File Locations

Warning: There is a deprecated flask_dashboard.py in the project root. Do NOT use it.

Correct location: dashboards/backend/flask_dashboard.py - Uses templates from dashboards/templates/
Deprecated (DO NOT USE): flask_dashboard.py (root) - Uses old dashboard_template.html with mismatched API endpoints

Directory Structure

dashboards/
├── backend/
│   └── flask_dashboard.py    # Article Dashboard backend (port 8083)
├── templates/
│   ├── ask_dashboard.html    # Ask Dashboard template
│   └── main_dashboard.html   # Article Dashboard template
└── run_ask_dashboard.py      # Ask Dashboard backend (port 5002)

Restarting Dashboards

To kill and restart a dashboard:

# Ask Dashboard
pkill -9 -f "run_ask_dashboard" 2>/dev/null; sleep 1; .venv/bin/python3 dashboards/run_ask_dashboard.py &

# Article Dashboard
pkill -9 -f "flask_dashboard" 2>/dev/null; sleep 1; .venv/bin/python3 dashboards/backend/flask_dashboard.py &

Ask Dashboard - Real-time Q&A Monitoring

Category Breakdown

Classification Breakdown

Temporal Subclassification Breakdown

Ranking Metrics Settings ( Save )

Article Score Formula

Decay Score Formula

Ask Dashboard - Real-time Q&A Monitoring

Category Breakdown

Classification Breakdown

Temporal Subclassification Breakdown

Ranking Metrics Settings ( Save )

Add Feedback

How to Use the Ask Dashboard

Overview

Header Controls

Summary Statistics

Filters

Question Cards

Question Badges

Sources Section Badges

Sources Table

Source Row Colors

Score Colors

Keyboard Shortcuts

LLM-as-Judge Evaluation

Formulas

Article Score

Decay Score

Alt6 Scoring (Temporal-Aware Relevance)

Relevance Health Diagnostic

Decay Lab Results

Technical Documentation (for Claude Code)

Dashboard Startup Commands

Important: Dashboard File Locations

Directory Structure

Restarting Dashboards

Article Score Formula

Decay Score Formula