Overview
This dashboard monitors questions asked through the Ask module in real-time. It displays questions, answers, source articles, named entities, and performance metrics.
Header Controls
- Auto-refresh: Toggle to enable/disable automatic data refresh
- Interval: Set how often the dashboard refreshes (1-15 minutes)
- Last updated: Shows when data was last fetched
Summary Statistics
- Total Questions: Number of questions in the current view
- Backstop Queries: Questions that required fallback processing
- Unique Cited Domains: Count of distinct domains actually cited in answers. Hover over this box to see a breakdown showing each domain, how many times it was cited, and its average Domain Reliability (DR) score.
- Last updated: Shows when the dashboard last checked for new questions
Date Range Selection:
- Preset Buttons: Quick selection options - Today, Yesterday, Last 7/15/30/60/90 days (default: Last 15)
- Custom Date Range: Enter specific start and end dates, then click "Apply" to filter. Click "Clear" to return to preset mode.
- Mutual Exclusivity: Preset buttons and custom dates are mutually exclusive - selecting one disables the other.
- Auto-refresh Indicator: Shows whether auto-refresh is enabled (green dot = ON, gray dot = OFF).
Auto-refresh Behavior:
- Auto-refresh ON: Today, Last 7, Last 15, Last 30, Last 60, Last 90 days (live monitoring)
- Auto-refresh OFF: Yesterday and Custom date ranges (historical data, no updates expected)
Performance metrics are split into two rows:
- Non-Backstop row (green): Averages for queries answered without fallback
- Backstop row (red): Averages for queries that required fallback processing
Each row shows: Total, Search, Answer, Suggest, Other, Words, W/Sec, Cites, and Avg Score (average article score of cited sources).
Note: Click section headers to expand/collapse the Category/Classification Breakdown Tables and Alt6 Ranking Analysis sections.
Category Breakdown table:
- Name: The question category (e.g., Politics, Sports, Technology)
- #: Number of questions in that category
- Avg Cited DR: Average domain reliability of cited sources, averaged across all questions in the category
- Avg Cited Score: Average production score of cited sources, averaged across all questions in the category
- Avg Top 8 DR: Average domain reliability of top 8 sources by rank, averaged across all questions in the category
- Avg Top 8 Score: Average production score of top 8 sources by rank, averaged across all questions in the category
- Avg Alt1 Score: Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
- Avg Alt1 Top 8 DR: Average domain reliability of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
- Avg Alt2 Score: Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
- Avg Alt2 Top 8 DR: Average domain reliability of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
Click the sort arrows on any column header to sort the table.
Classification Breakdown table:
- Name: The question classification (e.g., verified, unverified, etc.)
- #: Number of questions with that classification
- Avg Cited DR: Average domain reliability of cited sources, averaged across all questions with that classification
- Avg Cited Score: Average production score of cited sources, averaged across all questions with that classification
- Avg Top 8 DR: Average domain reliability of top 8 sources by rank, averaged across all questions with that classification
- Avg Top 8 Score: Average production score of top 8 sources by rank, averaged across all questions with that classification
- Avg Alt1 Score: Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
- Avg Alt1 Top 8 DR: Average domain reliability of top 8 sources by Alt1 rank (n/a if Alt Score 1 not configured)
- Avg Alt2 Score: Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
- Avg Alt2 Top 8 DR: Average domain reliability of top 8 sources by Alt2 rank (n/a if Alt Score 2 not configured)
Click the sort arrows on any column header to sort the table. Default sort is by count descending.
Alt6 Ranking Analysis (Temporal Decay vs Prod):
Compares how Alt Score 6 (temporal decay scoring) ranks sources differently from Production. Alt6 applies subclass-specific half-life decay to relevance scores, which can significantly change rankings for time-sensitive content.
Two-Tier Ranking System: Alt6 ranking uses the same two-tier approach as Production ranking:
- Tier 1 (Top positions): Non-excluded sources ranked by Alt6 score (highest = rank 1)
- Tier 2 (Bottom positions): Excluded sources ranked by Alt6 score among themselves
- An excluded source with a high Alt6 score will always rank below a non-excluded source with a lower Alt6 score
- Exclusion criteria: backend is_excluded flag, classification (adult/conspiracy/gambling), and user-configured thresholds (DR, Score, Semantic, etc.)
Ranking Shift Metrics:
- Avg absolute change: Average number of slots sources moved (regardless of direction)
- Avg improvement/worsening: Average slot change for sources that improved/worsened
- % improved/worsened: Percentage of sources that moved up/down in ranking
- Bottom 1/3 -> Top 1/3: Sources that jumped from bottom third to top third
- Top X dropped D+ slots: Top-ranked sources that dropped significantly
- Bottom 50% -> Top X: Sources from bottom half that made it to top X (only for questions with 50+ sources)
- Jumped/Dropped K+ slots: Sources with significant rank changes
Configure thresholds (Top X, K, D) in Ranking Metrics Settings. Summary shows percentages; per-question shows raw counts.
Ranking Shift Metrics (per question):
Expand this section on any question card to see detailed ranking shift metrics for that specific question, comparing Alt6 vs Production ranking.
Master Article Decay Score Settings:
Configure alternative decay score formulas. The decay formula is: Decay Score = Base Score × e^(-(λ × (t^p)))
- Prod Decay Score: Read-only display of production values (λ=0.03, p=1.25)
- Alt Decay Score 1 & 2: Configure custom decay formulas
Variables (nested dependencies):
- Base Score: Semantic/embedding score (always independently selectable)
- e (Euler's number): Constant 2.71828 - enables exponential decay
- ^(-(exponent)): Negative exponential (requires e to be checked)
- λ (Decay Rate): Controls decay speed (requires ^(-(exponent)))
- t (Days Since Publication): Time variable (requires ^(-(exponent)))
- p (Power): Exponent for time (requires t)
Check the Save checkbox to persist your configuration across page refreshes.
Alt Decay Columns (AD1 & AD2):
When Alt Decay Score 1 or 2 is enabled, two new columns appear in the Sources table: AD1 and AD2. These columns display:
- Computed Value: The decay score calculated using your configured formula (e.g., 0.847)
- Formula Breakdown: Shows the actual values used in the computation (e.g., 0.923 × e^(-0.05 × 14.5^1.5))
- Column Header Tooltip: Hover over AD1/AD2 header to see the configured formula
Key Details:
- Alt Decay uses the raw semantic score (not percentalized)
- Days since publication (t) is calculated from source's published_at vs question's asked_at
- Columns update immediately when you change configuration values while toggle is ON
- When toggle is OFF, columns display "-" for all sources
Show PCT Mode:
- When "Show PCT" is enabled, Alt Decay columns display percentalized values (0-1 range)
- Percentalization uses min-max normalization across all sources within the question
- Column headers show "(%) " suffix when PCT mode is ON
- Highest Alt Decay value becomes 1.0, lowest becomes 0.0
- In PCT mode, formula breakdown is hidden (only the percentile value is shown)
Filters
- User: Filter questions by specific user
- Classification: Filter by question type (Investigative, Temporally-Aware, etc.)
- Category: Filter by content category (politics, business, sports, etc.)
- Backstop: Filter by backstop status (All, Non-Backstop, or Backstop only)
All metrics and statistics update dynamically based on the current filter selection.
Question Cards
Each question is displayed as a card with expandable sections:
- Answer: The generated response with citation numbers (tabs for Alt Answers in header)
- Sources: Source articles used (click headers to expand)
- Suggestions: Follow-up questions
- Named Entities: People, organizations, locations identified
- Performance Metrics: Timing data for the query
- Ranking Shift Metrics: How Alt Score rankings compare to Prod rankings
Answer Section (Three-Column Layout):
The Answer section displays three columns side-by-side for easy comparison:
- Answer (Column 1): The production answer from the original query
- Alt Answer 1 (Column 2): Generate an alternative answer using top 8 sources ranked by Alt Score 1
- Alt Answer 2 (Column 3): Generate an alternative answer using top 8 sources ranked by Alt Score 2
How to use the Answer Section:
- Expand/Collapse: Click anywhere on the Answer header to toggle the section
- Side-by-Side Comparison: All three answers are visible simultaneously when expanded
- Vertical Dividers: Columns are separated by vertical lines for clarity
- Equal Width: Each column takes 33% of the width
How Alt Answers work:
- Alt Score 1 or 2 must be configured in Master Article Score Settings to enable generation
- Click "Generate Alt Answer" to call the API with the top 8 sources by Alt Rank
- Two-tier ranking ensures the top 8 sources are always non-excluded (excluded sources are ranked at the bottom)
- The answer is generated using the same question but with different source articles
- Generated answers are cached for the session (not persisted across page refreshes)
Question Badges
CATEGORY
Content category (politics, sports, etc.)
CLASSIFICATION
Question type classification
BACKSTOP
Question required fallback processing
Avg Prod Cited Score: 0.XXX
Average production score of cited sources only
Avg Prod Top 8 Score: 0.XXX
Average production score of top 8 sources by rank (or all sources if fewer than 8)
Avg Alt1 Top 8 Score: 0.XXX
Average Alt1 score of top 8 sources by Alt1 rank (n/a if Alt1 not configured)
Avg Alt2 Top 8 Score: 0.XXX
Average Alt2 score of top 8 sources by Alt2 rank (n/a if Alt2 not configured)
Sources Section Badges
These badges appear in the Sources section for each question:
Avg Prod Cited Domain Reliability: 0.XXX
Average domain reliability of cited sources only
Avg Prod Top 8 Domain Reliability: 0.XXX
Average domain reliability of top 8 sources by rank
Avg Prod Cited Readability: 0.XXX
Average readability of cited sources only
Avg Prod Top 8 Readability: 0.XXX
Average readability of top 8 sources by rank
Avg Alt1 Top 8 Domain Reliability: 0.XXX
Average DR of top 8 sources by Alt1 rank (n/a if Alt1 not configured)
Avg Alt1 Top 8 Readability: 0.XXX
Average readability of top 8 sources by Alt1 rank (n/a if Alt1 not configured)
Avg Alt2 Top 8 Domain Reliability: 0.XXX
Average DR of top 8 sources by Alt2 rank (n/a if Alt2 not configured)
Avg Alt2 Top 8 Readability: 0.XXX
Average readability of top 8 sources by Alt2 rank (n/a if Alt2 not configured)
Sources Table
The sources table shows articles considered for the answer:
| Age | Time since article was published |
| Rnk | Article rank (lower = higher relevance) |
| Score | Combined relevance score (0-1) |
| Sem | Semantic similarity score |
| BM25 | Keyword matching score |
| Cross | Cross-encoder relevance score |
| DR | Domain Reliability score (0-100) |
| Depth | Content Depth Score (0-5) |
| Pos | Positive Sentiment |
| Neg | Negative Sentiment |
| Sent | Sentiment summary (sum of the positive and negative sentiment scores) |
| Read | Readability score |
| Decay | Time adjusted semantic score |
| Class | Article classification |
| Excl | Whether the content is ignored |
Source Row Colors
Pink background: Top-8 article NOT cited in the answer
White background: Article cited in the answer OR ranked below 8
Score Colors
0.600
High score (>= 0.5)
0.350
Medium score (0.2 - 0.5)
0.100
Low score (< 0.2)
Keyboard Shortcuts
- Escape: Close this help dialog
LLM-as-Judge Evaluation
This feature uses OpenAI's GPT-4o-mini to evaluate and compare answer quality. Located between the Answer and Sources sections in each question card.
How to Use:
- Expand the "LLM-as-Judge Evaluation" section
- Click "Evaluate Answers" button to trigger evaluation
- Evaluation includes the Production answer, plus Alt Answer 1 and/or Alt Answer 2 if generated
Evaluation Criteria (1-4 scale):
| Accuracy | Is the answer factually correct? Evaluates correctness of claims, dates, facts, and statements |
| Completeness & Relevance | How thoroughly the answer addresses all aspects of the question |
| Contextual Understanding | How well the answer demonstrates understanding of the question's nuances |
| Clarity | How clear and easy to understand the answer is |
| Conciseness | How efficiently the answer conveys information without verbosity |
| Tone & Language Flow | How natural, professional, and appropriate the language is |
| Formatting | How well-structured and visually organized the answer is |
Score Scale:
4 (VeryGood)
Excellent performance on this criterion
3 (Good)
Adequate performance with minor issues
2 (Bad)
Noticeable problems affecting quality
1 (VeryBad)
Significant failures on this criterion
Per-Criterion Reasoning:
Each criterion includes a detailed reasoning (2-3 sentences) explaining why that specific score was assigned. The reasoning appears below each criterion score in italic text.
General Commentary:
Each answer receives a holistic assessment with specific examples and observations. This appears at the bottom of each score card in a highlighted box.
Comparison Analysis:
When multiple answers are available (Prod + Alt1 and/or Alt2), the LLM provides a comparative analysis explaining which answer is best and why, considering both content quality and source relevance.
Source Context:
The evaluation considers the top 8 ranked sources (title, description, and fragment) for each answer type, using Production ranking for the Prod answer and Alt Score ranking for Alt answers.
Alt6 Scoring (Temporal-Aware Relevance)
Alt6 is an experimental scoring formula designed for temporally-aware questions. It combines relevance with a temporal decay factor based on the question's temporal subclass.
Formula:
Alt6 = RelPct × DecayFactor
Where:
• RelPct = Percentile rank of the source's Relevance across all sources in the session pool
• DecayFactor = Time decay based on temporal subclass (see below)
Relevance Calculation:
Relevance = w_cross × p_cross + (1 - w_cross) × p_bm25
Where:
• w_cross = 0.85 (default, configurable in Alt6 settings)
• p_cross = score_cross_pct (cross-encoder percentile)
• p_bm25 = score_bm_25_pct (BM25 percentile)
Temporal Subclasses & Decay Parameters:
| BRT | Breaking/Recent Topics - Half-life: 1 day, Floor: 0.05 |
| RWR | Recent Window Required - Half-life: 7 days, Floor: 0.10 |
| TAH | Temporally-Aware Historical - DecayFactor = 1.0 (no decay) |
| CRBN | Context-Rich Background News - Half-life: 30 days, Floor: 0.20 |
| UNKNW | Unknown - Half-life: 14 days, Floor: 0.15 |
Decay formula: DecayFactor = max(floor, 0.5^(age_days / half_life))
Unknown Publish Date Handling:
Sources with missing or invalid publish dates (age >10 years detected as anomaly) use subclass-specific fallback decay factors:
| BRT/RWR | Floor value (penalize unknown for time-sensitive questions) |
| TAH | 1.0 (date irrelevant for timeless content) |
| CRBN | 0.70 (moderate - reference content more forgiving) |
| UNKNW | Floor value (conservative default) |
Sources using fallback show "(no date)" indicator in purple.
UI Indicators:
- Rel(X.XXX)→RelPct(X.XX) - Shows raw Relevance score and its percentile rank in the session pool
- q-rank: N/M - Shows the source's rank within this question (N of M sources, sorted by Relevance descending)
- ⓘ icons - Hover for detailed breakdown tooltips showing raw values and calculations
- Session pool - Displayed in the Alt6 config panel, shows total sources used for percentile calculation
- "inputs missing" - Shown when a source lacks cross_pct or bm25_pct values needed for Alt6
- Decay badge - Shows temporal subclass and decay parameters near the question/answer
Score Deltas:
- ScoreΔ = Alt6 - ProdScore (positive = Alt6 scores higher)
- RankΔ = ProdRank - Alt6Rank (positive = Alt6 ranks the source higher)
Relevance Health Diagnostic
A per-question diagnostic that summarizes the absolute strength of the retrieved candidate set using RAW Relevance (not RelPct), to answer: "Do we have enough strong sources, or should we retrieve more / broaden search?"
Status Levels:
🟢 Strong
max ≥ 0.75 AND count(Relevance ≥ 0.70) ≥ 3
🟡 Thin
max ≥ 0.65 AND count(Relevance ≥ 0.60) ≥ 2
🔴 Weak — expand sources
Otherwise (candidate set may need broader search)
Detail Panel (click to expand):
- Counts: N total sources, N valid (with pCross and pBM25)
- Distribution: max, p75, median of Relevance
- Strength: Count of sources ≥0.70, ≥0.60, ≥0.50
Note: This diagnostic uses raw Relevance (time-independent signal: meaning + keywords) to detect when the candidate set is weak even if ranks or percentiles look fine. It does not affect ranking or selection.
Decay Lab Results
Diagnostics showing how temporal decay is behaving across each subclass. Helps tune Half-life and Floor parameters safely.
Metrics (per subclass):
- Floor-bound rate: Percentage of sources at the decay floor ("as old as we'll treat them")
- Median decay: Typical DecayFactor value—shows how strongly time affects results
- Age@95% floor: Age by which 95% of floor-bound sources have reached the floor
- Spread (P75–P25): Range of DecayFactor values—indicates decay curve diversity
Status Indicators:
🟢 Normal
Metric is within expected range for this subclass
🟡 Warning
Metric is slightly outside expected range
🔴 Concerning
Metric is far outside expected range—consider adjusting parameters
Tuning Tips:
- High floor-bound rate: Try increasing half-life or lowering floor
- Low floor-bound rate: Try decreasing half-life or raising floor
- Median decay too low: Try increasing half-life or raising floor
- Median decay too high: Try decreasing half-life or lowering floor
Note: TAH (Temporally-Aware Historical) has decay disabled (DecayFactor = 1.0 always). Input fields show "Typical" ranges below them, with warnings for values outside recommended bounds.
Technical Documentation (for Claude Code)
Last updated: 2026-01-31
Dashboard Startup Commands
There are two dashboards in this project:
- Ask Dashboard (Q&A monitoring) - Port 5002
.venv/bin/python3 dashboards/run_ask_dashboard.py
URL: http://localhost:5002
- Article Dashboard (streaming article monitor) - Port 8083
.venv/bin/python3 dashboards/backend/flask_dashboard.py
URL: http://localhost:8083
Important: Dashboard File Locations
Warning: There is a deprecated flask_dashboard.py in the project root. Do NOT use it.
- Correct location:
dashboards/backend/flask_dashboard.py - Uses templates from dashboards/templates/
- Deprecated (DO NOT USE):
flask_dashboard.py (root) - Uses old dashboard_template.html with mismatched API endpoints
Directory Structure
dashboards/
├── backend/
│ └── flask_dashboard.py # Article Dashboard backend (port 8083)
├── templates/
│ ├── ask_dashboard.html # Ask Dashboard template
│ └── main_dashboard.html # Article Dashboard template
└── run_ask_dashboard.py # Ask Dashboard backend (port 5002)
Restarting Dashboards
To kill and restart a dashboard:
# Ask Dashboard
pkill -9 -f "run_ask_dashboard" 2>/dev/null; sleep 1; .venv/bin/python3 dashboards/run_ask_dashboard.py &
# Article Dashboard
pkill -9 -f "flask_dashboard" 2>/dev/null; sleep 1; .venv/bin/python3 dashboards/backend/flask_dashboard.py &