# Lower-League Football Derivatives — AI Betting System Spec
**Version:** 1.0
**Date:** March 2026
**Starting Bankroll:** £500
**Output:** Telegram alerts with bet recommendations
---
## 1. What This System Does
A pre-match AI system that scans upcoming lower-league football fixtures, models the probability distribution of in-match events (goals, corners, cards), compares those probabilities to bookmaker odds, and sends you a Telegram alert when it finds a bet where your model has a meaningful edge.
You receive a message like:
```
🔔 VALUE BET FOUND
Match: Barnsley vs Leyton Orient (League One)
Kickoff: Sat 5 Apr, 15:00
Market: Over 2.5 Goals
Bookmaker: Pinnacle @ 2.15
Model Probability: 54.2%
Implied Odds Fair Price: 1.85
Edge: +7.8%
Stake: £12.50 (2.5% of bankroll)
Expected Value: +£0.97
Confidence: ★★★☆☆
```
---
## 2. Target Leagues
Start with these — they have good data coverage but far less modeling competition than the top 5 European leagues:
### Phase 1 (Launch)
| League | Country | Why |
|--------|---------|-----|
| League One | England | Good data via FBref, decent bookmaker coverage |
| League Two | England | Even less attention from sharp bettors |
| Scottish Premiership | Scotland | Small league, stable teams, predictable patterns |
| Championship | England | More liquid odds, good data — slightly sharper market |
### Phase 2 (Expand after 3 months)
| League | Country | Why |
|--------|---------|-----|
| Danish Superliga | Denmark | Growing data, thin modeling competition |
| Norwegian Eliteserien | Norway | Summer league (fills gaps when English season is off) |
| Swedish Allsvenskan | Sweden | Same as Norway |
| Belgian Pro League | Belgium | Decent data, less modeled than Dutch Eredivisie |
---
## 3. Target Markets
### Phase 1 — Goals-Based (start here)
These have the best data and most liquid bookmaker odds:
| Market | Example | Why |
|--------|---------|-----|
| Over/Under Goals | Over 2.5 goals @ 2.10 | Most data, most liquid, proven modellable |
| Asian Goal Handicap | Home -0.5 @ 1.95 | More granular than 1X2, less efficient pricing |
| Both Teams to Score (BTTS) | Yes @ 1.80 | Directly derivable from goals model |
| Exact Score | 2-1 @ 8.50 | High margin = high potential edge if model is good |
| Half-Time / Full-Time | Draw/Home @ 6.00 | Complex combination = less efficient pricing |
### Phase 2 — Corners & Cards (add later)
| Market | Example | Why |
|--------|---------|-----|
| Over/Under Corners | Over 9.5 @ 1.90 | Requires separate model, good edge potential |
| Asian Corner Handicap | Home -1.5 corners @ 2.00 | Very thinly modeled in lower leagues |
| Total Cards | Over 3.5 cards @ 1.85 | Referee-dependent, exploitable with ref data |
| Player Cards | Player X to be booked @ 3.50 | Tiny market, huge margins, hard to scale |
---
## 4. System Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ DAILY AUTOMATED PIPELINE │
│ (runs every morning ~7am) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ DATA COLLECTION │ ← Pulls today's fixtures + historical data
│ │ ← Sources: FBref, Football-Data.co.uk,
│ │ Understat, API-Football
└──────────────────┘
│
▼
┌──────────────────┐
│ FEATURE │ ← Calculates team strength, form, home/away
│ ENGINEERING │ splits, head-to-head, referee tendencies,
│ │ league-level baselines
└──────────────────┘
│
▼
┌──────────────────┐
│ MODEL LAYER │ ← Poisson regression for goals per team
│ │ ← Outputs full probability distribution
│ │ (exact score grid → derives all markets)
└──────────────────┘
│
▼
┌──────────────────┐
│ ODDS COMPARISON │ ← Pulls live odds from bookmakers via
│ & VALUE FINDER │ Odds API
│ │ ← Compares model prob vs implied prob
│ │ ← Flags bets where edge > threshold
└──────────────────┘
│
▼
┌──────────────────┐
│ BANKROLL & │ ← Fractional Kelly criterion (quarter-Kelly)
│ STAKE SIZING │ ← Max stake caps
│ │ ← Drawdown protection rules
└──────────────────┘
│
▼
┌──────────────────┐
│ TELEGRAM ALERT │ ← Sends formatted message to your phone
│ │ ← Includes: match, market, odds, stake,
│ │ edge %, confidence rating
└──────────────────┘
│
▼
┌──────────────────┐
│ BET TRACKER │ ← Logs every recommendation
│ │ ← You mark placed/skipped + result
│ │ ← Tracks P&L, ROI, CLV, model calibration
└──────────────────┘
```
---
## 5. Data Sources (All Free / Open Source)
### Historical Match Data
| Source | What it provides | Access |
|--------|-----------------|--------|
| **Football-Data.co.uk** | Match results, goals, corners, cards, shots, odds — going back 20+ years for English leagues | Free CSV downloads |
| **FBref / StatsBomb** | Advanced stats (xG, xGA, shot locations, passing) for many leagues | Free web scraping |
| **Understat** | xG model data for top European leagues (less coverage for lower leagues) | Free API/scraping |
| **Transfermarkt** | Squad values, transfers, injuries — useful context features | Free scraping |
### Live Odds
| Source | What it provides | Access |
|--------|-----------------|--------|
| **The Odds API** | Real-time odds from 40+ bookmakers including Pinnacle, Bet365, Unibet | Free tier: 500 requests/month — enough for daily use |
| **Pinnacle API** | Direct odds from the sharpest bookmaker (no account restrictions) | Free with Pinnacle account |
### Fixtures
| Source | What it provides | Access |
|--------|-----------------|--------|
| **API-Football** | Fixtures, lineups, referee assignments | Free tier: 100 requests/day |
---
## 6. The Model — How It Works
### Core Approach: Dixon-Coles Poisson Model
This is the gold standard for football match prediction and it's open source. Here's the logic:
**Step 1: Estimate team attack and defence strengths**
Every team gets two ratings:
- **Attack strength** — how many goals they tend to score (relative to league average)
- **Defence strength** — how many goals they tend to concede (relative to league average)
These are estimated using historical match results with a **time-decay weighting** (recent matches matter more than old ones).
**Step 2: Predict goals for a specific match**
For a match between Team A (home) and Team B (away):
```
Expected goals for A = Home advantage × A's attack × B's defence weakness × League avg goals
Expected goals for B = B's attack × A's defence weakness × League avg goals
```
**Step 3: Generate the exact score probability grid**
Using the Poisson distribution, calculate the probability of every scoreline from 0-0 to 5-5 (or higher). This gives you a 6×6 grid of probabilities.
The Dixon-Coles adjustment corrects for the known issue that Poisson under-predicts draws and low-scoring outcomes (0-0, 1-0, 0-1, 1-1).
**Step 4: Derive all market probabilities from the grid**
From that single score grid, you can calculate:
- Over/Under 2.5 goals = sum of all cells where total goals > 2.5
- BTTS Yes = sum of all cells where both teams scored ≥ 1
- Asian Handicap Home -1.5 = sum of all cells where home goals - away goals > 1.5
- Exact Score 2-1 = the single cell for (2,1)
- Half-Time/Full-Time = requires a modified model (bivariate Poisson for each half)
### Enhancements Beyond Basic Dixon-Coles
| Enhancement | What it does | Impact |
|-------------|-------------|--------|
| **Time decay** | Weight recent matches more heavily (half-life ~30 matches) | Captures current form |
| **xG integration** | Use expected goals instead of actual goals as input | Reduces noise from lucky/unlucky results |
| **Home advantage by league** | Different leagues have different home advantages | More accurate for specific leagues |
| **Promoted/relegated team adjustments** | New teams in a league have no history — use proxy ratings | Avoids cold-start problem |
| **Referee features** | Cards model uses referee-specific foul/card rates | Critical for Phase 2 cards markets |
| **Injury/suspension data** | Adjust team strength when key players are missing | Meaningful for small squads in lower leagues |
### Open-Source Libraries to Use
| Library | Purpose | Language |
|---------|---------|----------|
| **penaltyblog** (Python) | Dixon-Coles model implementation, ready to use | Python |
| **footballpredictions** (Python) | Poisson-based football prediction toolkit | Python |
| **scikit-learn** | Feature engineering, model selection, cross-validation | Python |
| **scipy.stats** | Poisson distribution calculations | Python |
| **pandas** | Data manipulation and cleaning | Python |
| **numpy** | Numerical computation | Python |
---
## 7. Value Detection — When to Bet
### Edge Calculation
```
Model probability: 54.2% (your model says Over 2.5 goals)
Bookmaker odds: 2.15 (implies 46.5% probability)
Edge: 54.2% - 46.5% = +7.7%
```
### Minimum Edge Thresholds
| Bankroll stage | Minimum edge to bet | Rationale |
|----------------|-------------------|-----------|
| £0 – £500 (Phase 1) | 5%+ | Conservative — proving the model works |
| £500 – £1,500 | 4%+ | Model has track record, can be slightly more aggressive |
| £1,500+ | 3%+ | Established edge, maximise volume |
### Confidence Rating System
Each bet gets a confidence score (1-5 stars) based on:
| Factor | Weight |
|--------|--------|
| Size of edge (bigger = more confident) | 30% |
| Model certainty (how tight is the probability estimate) | 25% |
| Data quality (how much history exists for these teams) | 20% |
| Odds movement (are odds moving toward or away from your price) | 15% |
| Market liquidity (can you actually get this bet on) | 10% |
---
## 8. Bankroll Management
### The Rules (Non-Negotiable)
With a £500 starting bankroll, discipline is everything. One bad week of over-staking and you're wiped out.
| Rule | Setting | Rationale |
|------|---------|-----------|
| **Staking method** | Quarter-Kelly | Full Kelly is too aggressive for a new model. Quarter-Kelly gives ~75% of the growth rate with far less variance |
| **Maximum single stake** | 3% of current bankroll (£15 at start) | Caps downside from any single bad bet |
| **Maximum daily exposure** | 10% of bankroll (£50 at start) | Prevents correlated losses from wiping you out on a single matchday |
| **Stop-loss trigger** | If bankroll drops 30% from peak (to £350), pause for 1 week and review model | Forces you to reassess rather than chase losses |
| **Minimum bankroll to continue** | £200 | Below this, the model needs fundamental revision before continuing |
### Quarter-Kelly Stake Formula
```
Kelly % = (edge / (odds - 1)) × 0.25
Example:
Edge = 7.7%
Odds = 2.15
Kelly % = (0.077 / 1.15) × 0.25 = 1.67%
Stake = £500 × 1.67% = £8.35
Rounded to: £8.50
```
### Bankroll Growth Projections (Illustrative)
Assuming 5% average edge across 200 bets at quarter-Kelly:
| Scenario | After 200 bets | After 500 bets |
|----------|---------------|----------------|
| Good run (55% hit rate) | ~£650 | ~£950 |
| Expected (52% hit rate) | ~£580 | ~£720 |
| Bad run (48% hit rate) | ~£430 | ~£400 |
**Key point:** Even with genuine edge, you should expect losing months. The system is designed for long-run profitability, not instant returns.
---
## 9. Bookmaker Strategy
### Primary Platforms
| Platform | Why | Limitation risk |
|----------|-----|-----------------|
| **Pinnacle** | Does NOT limit winners. Sharp odds. Asian handicaps on lower leagues. Your anchor bookmaker. | None — this is your primary venue |
| **Betfair Exchange** | No limits. You set your own odds. | Liquidity thin for lower-league derivatives |
| **Bet365** | Widest market coverage for lower leagues. Good for corners/cards. | WILL limit you after sustained winning. Use while it lasts. |
| **Unibet / 888sport** | Decent coverage, slightly slower to limit | Will limit eventually |
| **Betway** | Good for specific derivatives | Will limit eventually |
### Account Management Strategy
1. Open accounts with ALL of the above before you start winning
2. Use Pinnacle as your primary — it will never restrict you
3. Spread bets across the others to delay account restrictions
4. Vary your stake sizes slightly (don't always bet exact Kelly amounts)
5. Mix in some recreational-looking bets occasionally on traditional bookmakers
6. When you get limited on a bookmaker, move that volume to Pinnacle/Betfair
7. Consider using a betting exchange bot for Betfair (open-source options exist)
---
## 10. Telegram Bot — Alert Format
### Setup
- Create a Telegram bot via @BotFather
- Get your chat ID
- The system sends HTTP POST requests to the Telegram Bot API
### Alert Types
**Value Bet Alert (main output):**
```
🔔 VALUE BET FOUND
Match: Barnsley vs Leyton Orient
League: League One 🏴
Kickoff: Sat 5 Apr, 15:00
Market: Over 2.5 Goals
Best Odds: Pinnacle @ 2.15
Model Prob: 54.2% | Implied: 46.5%
Edge: +7.7%
Stake: £8.50 (1.7% of bankroll)
Kelly: Quarter-Kelly
Confidence: ★★★☆☆
Model Notes: Both teams averaging 3.1 goals/game
over last 10 matches. Referee avg 5.2 fouls/game.
[Place on Pinnacle] [Skip] [Log as placed]
```
**Daily Summary (end of day):**
```
📊 DAILY SUMMARY — Sat 5 Apr
Bets recommended: 4
Bets placed: 3
Results so far: 2W / 1L / 0P
Today's P&L: +£14.20
Week P&L: +£31.50
Bankroll: £531.50
Best bet: Barnsley vs Leyton Orient O2.5 ✅ (+£9.80)
Worst bet: Exeter vs Mansfield BTTS ❌ (-£6.00)
```
**Weekly Performance Report:**
```
📈 WEEKLY REPORT — W/C 31 Mar
Bets: 18 recommended | 15 placed
Record: 9W / 6L
Hit Rate: 60.0%
Average Edge: 6.2%
ROI: +8.4%
CLV: +3.1% (beating closing lines)
Bankroll: £541.20 (+£41.20 / +8.2%)
Top league: League Two (+£28.40)
Top market: Over 2.5 Goals (+£22.10)
```
---
## 11. Bet Tracking & Performance Monitoring
### What to Track for Every Bet
| Field | Why |
|-------|-----|
| Date and time | Basic record |
| Match | What the bet was on |
| League | To measure which leagues your model is best at |
| Market | To measure which bet types have most edge |
| Model probability | Your model's estimate |
| Odds taken | What you actually got |
| Closing odds | What the odds were at kickoff — this is the key metric |
| Stake | How much you bet |
| Result | Win/Loss/Push |
| P&L | Profit or loss in £ |
### Key Performance Metrics
| Metric | Target | What it means |
|--------|--------|---------------|
| **Closing Line Value (CLV)** | Positive | You're consistently betting at better odds than the market settles at. THE most important metric. |
| **ROI** | 3-8% | Return on investment across all bets. Above 5% long-term is excellent. |
| **Hit Rate** | Varies by odds | Meaningless in isolation — a 45% hit rate at average odds of 2.30 is profitable |
| **Yield by League** | Positive per league | Shows where your model has most edge |
| **Yield by Market** | Positive per market | Shows which bet types to focus on |
| **Drawdown** | < 30% from peak | If you hit this, pause and reassess |
| **Calibration** | Model prob ≈ actual frequency | If you say 55%, it should win ~55% of the time |
### Model Recalibration Triggers
Re-train the model when any of these happen:
- 100 new bets have been placed since last calibration
- A new season starts
- CLV turns negative for 50+ consecutive bets
- A league's prediction accuracy drops below a defined threshold
- Major rule changes (e.g., VAR introduction in a league)
---
## 12. Technology Stack
### All Open Source
| Component | Tool | Why |
|-----------|------|-----|
| Language | **Python 3.11+** | Best ecosystem for data science and sports modeling |
| Data storage | **SQLite** → upgrade to **PostgreSQL** if scaling | Simple to start, no server needed |
| Model | **penaltyblog** + **scipy** + **scikit-learn** | Dixon-Coles implementation + statistical tools |
| Data collection | **requests** + **BeautifulSoup** + **pandas** | Web scraping and API calls |
| Odds API | **The Odds API** (free tier) | Real-time bookmaker odds |
| Scheduling | **cron** (Linux/Mac) or **schedule** (Python) | Runs the pipeline daily at 7am |
| Telegram | **python-telegram-bot** | Sends alerts to your phone |
| Tracking | **Google Sheets API** or local **SQLite** | Logs bets and tracks performance |
| Visualization | **matplotlib** + **seaborn** | Weekly performance charts |
| Version control | **Git** | Track model changes and performance over time |
### Hosting Options
| Option | Cost | Suitability |
|--------|------|-------------|
| **Your own laptop** (cron job) | Free | Fine to start, but needs to be on at 7am |
| **Raspberry Pi** | ~£50 one-off | Always-on, low power, runs the daily job |
| **Free cloud tier** (Oracle Cloud, Google Cloud free tier) | Free | Always-on VM, runs cron daily |
| **Cheap VPS** (Hetzner, DigitalOcean) | ~£4/month | Most reliable option for daily automation |
---
## 13. Build Phases & Timeline
### Phase 1: Foundation (Weeks 1-3)
- [ ] Set up Python project with dependencies
- [ ] Download historical data from Football-Data.co.uk (English League One, League Two, Championship, Scottish Premiership)
- [ ] Build data cleaning and feature engineering pipeline
- [ ] Implement Dixon-Coles model using penaltyblog
- [ ] Backtest on 2 seasons of historical data
- [ ] Measure: Is the model calibrated? Does it find value historically?
### Phase 2: Odds & Value Detection (Weeks 3-4)
- [ ] Integrate The Odds API for live bookmaker odds
- [ ] Build the value detection engine (model prob vs bookmaker implied prob)
- [ ] Implement bankroll management (quarter-Kelly, caps, stop-loss)
- [ ] Build the edge calculation and confidence scoring
- [ ] Paper test for 1 week (recommend bets but don't place them)
### Phase 3: Telegram & Automation (Week 5)
- [ ] Create Telegram bot
- [ ] Build alert message formatting
- [ ] Set up daily cron job / scheduled task
- [ ] Build daily summary and weekly report generation
- [ ] Test end-to-end pipeline on live fixtures
### Phase 4: Go Live (Week 6+)
- [ ] Open bookmaker accounts (Pinnacle, Bet365, Betfair, Unibet, Betway)
- [ ] Start with small stakes (half of recommended Kelly)
- [ ] Place bets based on alerts
- [ ] Log results in tracking system
- [ ] Run for 4 weeks before assessing model performance
### Phase 5: Expand (Month 3+)
- [ ] Add Scandinavian leagues (Danish, Norwegian, Swedish)
- [ ] Build corners model (separate from goals model)
- [ ] Build cards model (referee-dependent)
- [ ] Add xG features from FBref/Understat
- [ ] Evaluate model performance and recalibrate
---
## 14. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Model has no real edge | Medium | Fatal | Extensive backtesting before going live. Paper trade first. Require positive CLV over 100+ bets before scaling. |
| Account restrictions on bookmakers | High | Medium | Primary venue is Pinnacle (never restricts). Spread across multiple books. |
| Data quality issues | Medium | High | Cross-reference multiple sources. Flag anomalies automatically. |
| Overfitting to historical data | Medium | High | Out-of-sample testing. Walk-forward validation. Keep model simple. |
| Bankroll wiped by variance | Low | Fatal | Quarter-Kelly staking. 3% max single stake. 30% drawdown stop-loss. |
| Odds API downtime | Low | Low | Cache odds. Fall back to manual checking. |
| Emotional interference (chasing losses, ignoring the model) | High | High | Automate as much as possible. Follow the alerts mechanically. Review weekly, not bet-by-bet. |
---
## 15. Success Criteria
After 3 months (minimum 150 bets placed):
| Metric | Pass | Fail |
|--------|------|------|
| CLV (Closing Line Value) | Consistently positive | Negative or zero |
| ROI | > 2% | < 0% |
| Model calibration | Within 3% at all probability ranges | Systematic bias |
| Bankroll | Above £400 (above stop-loss) | Below £350 |
| Emotional discipline | Following the system mechanically | Making off-system bets |
**If all "Pass" → Scale up bankroll and add Phase 2 markets.**
**If any "Fail" → Diagnose, recalibrate, or pivot to a different approach.**
---
## 16. What This System Does NOT Do
- Does NOT place bets automatically (you place them manually based on alerts)
- Does NOT guarantee profit — it gives you a statistical edge, not certainty
- Does NOT work in-play — this is a pre-match system only
- Does NOT replace discipline — the bankroll rules must be followed mechanically
- Does NOT account for inside information or match-fixing — these are model risks you accept
---
## Appendix: Key Terms
| Term | Meaning |
|------|---------|
| **Edge** | The difference between your model's probability and the bookmaker's implied probability. A positive edge means you believe the bet is underpriced. |
| **CLV (Closing Line Value)** | Whether you got better odds than the market's final price. The single best predictor of long-term profitability. |
| **Kelly Criterion** | A mathematical formula for optimal bet sizing based on your edge and the odds offered. |
| **Quarter-Kelly** | Betting 25% of what full Kelly recommends. Sacrifices ~25% of growth rate but dramatically reduces variance and risk of ruin. |
| **Dixon-Coles** | A widely-used statistical model for predicting football match scores, based on Poisson distributions with adjustments for low-scoring outcomes. |
| **Poisson Distribution** | A probability distribution that models the number of events (goals) in a fixed time period, given a known average rate. |
| **Asian Handicap** | A bet where one team is given a virtual goal advantage/disadvantage. Eliminates the draw outcome, creating a two-way market. |
| **Implied Probability** | The probability implied by bookmaker odds. Calculated as 1/odds. E.g., odds of 2.00 imply 50% probability. |
| **Pinnacle** | A bookmaker known for sharp odds and a policy of not restricting winning accounts. The gold standard for serious bettors. |
| **Paper Trading** | Running the system and tracking hypothetical bets without actually placing them. Used to validate the model before risking real money. |