Using Snowflake for Product Strategy: From Market Data to Development Roadmap
Most product roadmaps are built on intuition and HiPPO (Highest Paid Person’s Opinion). Ours is built on data.
As founding engineer of the EG-Labs AI initiative, I used Snowflake to analyze market data, user behavior, and competitive intelligence to build a prioritized AI feature roadmap. This data-driven approach helped us:
- Identify $2M revenue opportunity in voice AI
- Avoid building 3 features with low adoption potential
- Prioritize multimodal content generation (35% user adoption)
- Forecast feature ROI within 15% accuracy
Here’s how we did it.
The Challenge
Problem: Which AI features should we build first?
Options:
- AI voiceovers
- Content generation
- Real-time voice agents
- Auto-translations
- Smart recommendations
- Image generation
Traditional approach: Product manager’s gut feeling
Our approach: Data-driven decision making with Snowflake
Why Snowflake?
We chose Snowflake for product analytics because:
- Unlimited scale: Query terabytes without performance degradation
- Zero maintenance: No indexes, partitions, or tuning needed
- Standard SQL: Entire team can write queries
- Data sharing: Combine internal data with external market data
- Near-zero latency: Cached queries return in <1 second
Data Sources We Combined
┌─────────────────────────────────────────────────┐
│ Snowflake Data Warehouse │
├─────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌──────────────────────┐ │
│ │ App Events │ │ Market Intelligence │ │
│ │ - User clicks│ │ - Competitor data │ │
│ │ - Features │ │ - Industry trends │ │
│ │ - Sessions │ │ - Search volumes │ │
│ └───────────────┘ └──────────────────────┘ │
│ │
│ ┌───────────────┐ ┌──────────────────────┐ │
│ │ CRM Data │ │ Support Tickets │ │
│ │ - Customer │ │ - Feature requests │ │
│ │ segments │ │ - Pain points │ │
│ │ - Revenue │ │ - Bug reports │ │
│ └───────────────┘ └──────────────────────┘ │
│ │
│ ┌───────────────┐ ┌──────────────────────┐ │
│ │ Usage Costs │ │ A/B Test Results │ │
│ │ - API calls │ │ - Conversion rates │ │
│ │ - Storage │ │ - Engagement │ │
│ └───────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────┘
↓
┌──────────────┐
│ BI Tools │
│ - Tableau │
│ - Looker │
│ - Metabase │
└──────────────┘
Implementation
1. Data Ingestion Pipeline
// Stream app events to Snowflake
import { Snowflake } from 'snowflake-sdk';
const connection = Snowflake.createConnection({
account: process.env.SNOWFLAKE_ACCOUNT!,
username: process.env.SNOWFLAKE_USER!,
password: process.env.SNOWFLAKE_PASSWORD!,
warehouse: 'ANALYTICS_WH',
database: 'PRODUCT_DATA',
schema: 'EVENTS',
});
interface AnalyticsEvent {
userId: string;
eventName: string;
properties: Record<string, any>;
timestamp: Date;
}
export async function trackEvent(event: AnalyticsEvent) {
const sql = `
INSERT INTO events (user_id, event_name, properties, timestamp)
VALUES (?, ?, PARSE_JSON(?), ?)
`;
await connection.execute({
sqlText: sql,
binds: [
event.userId,
event.eventName,
JSON.stringify(event.properties),
event.timestamp.toISOString(),
],
});
}
// Track in application
trackEvent({
userId: '123',
eventName: 'feature_used',
properties: {
feature: 'ai_voiceover',
duration_seconds: 45,
success: true,
},
timestamp: new Date(),
});2. Feature Request Analysis
Query: What features are users requesting most?
-- Analyze support tickets for feature requests
WITH feature_requests AS (
SELECT
ticket_id,
created_at,
subject,
description,
customer_tier,
CASE
WHEN description ILIKE '%voice%' OR description ILIKE '%audio%'
THEN 'ai_voiceover'
WHEN description ILIKE '%translat%' OR description ILIKE '%language%'
THEN 'translation'
WHEN description ILIKE '%video%' OR description ILIKE '%record%'
THEN 'video_recording'
WHEN description ILIKE '%ai%' OR description ILIKE '%generat%'
THEN 'ai_content_generation'
ELSE 'other'
END AS feature_category
FROM support_tickets
WHERE created_at >= DATEADD('month', -6, CURRENT_DATE())
AND tags ILIKE '%feature-request%'
)
SELECT
feature_category,
COUNT(*) AS request_count,
COUNT(DISTINCT CASE WHEN customer_tier = 'enterprise' THEN ticket_id END) AS enterprise_requests,
COUNT(DISTINCT CASE WHEN customer_tier = 'pro' THEN ticket_id END) AS pro_requests,
COUNT(DISTINCT CASE WHEN customer_tier = 'free' THEN ticket_id END) AS free_requests
FROM feature_requests
GROUP BY feature_category
ORDER BY request_count DESC;Results:
| Feature | Total Requests | Enterprise | Pro | Free |
|---|---|---|---|---|
| ai_voiceover | 247 | 45 | 132 | 70 |
| translation | 183 | 38 | 89 | 56 |
| ai_content_generation | 156 | 29 | 91 | 36 |
| video_recording | 98 | 12 | 54 | 32 |
Insight: AI voiceover is #1 request, especially from paying customers.
3. Usage Pattern Analysis
Query: Which user segments are most likely to use AI features?
-- Identify high-propensity users for AI features
WITH user_segments AS (
SELECT
user_id,
DATEDIFF('day', created_at, CURRENT_DATE()) AS account_age_days,
plan_tier,
industry,
COUNT(DISTINCT course_id) AS courses_created,
AVG(course_completion_rate) AS avg_completion_rate,
CASE
WHEN COUNT(DISTINCT course_id) >= 10 THEN 'power_user'
WHEN COUNT(DISTINCT course_id) >= 3 THEN 'regular_user'
ELSE 'occasional_user'
END AS user_type
FROM users u
JOIN courses c ON u.user_id = c.owner_id
WHERE u.created_at >= DATEADD('month', -12, CURRENT_DATE())
GROUP BY 1, 2, 3, 4
),
ai_feature_usage AS (
SELECT
user_id,
COUNT(DISTINCT CASE WHEN event_name = 'ai_feature_used' THEN event_id END) AS ai_events,
MAX(timestamp) AS last_ai_usage
FROM events
WHERE event_name = 'ai_feature_used'
AND timestamp >= DATEADD('month', -3, CURRENT_DATE())
GROUP BY 1
)
SELECT
us.user_type,
us.plan_tier,
us.industry,
COUNT(DISTINCT us.user_id) AS total_users,
COUNT(DISTINCT af.user_id) AS ai_users,
ROUND(COUNT(DISTINCT af.user_id)::FLOAT / COUNT(DISTINCT us.user_id) * 100, 2) AS ai_adoption_pct,
AVG(af.ai_events) AS avg_ai_events_per_user
FROM user_segments us
LEFT JOIN ai_feature_usage af ON us.user_id = af.user_id
GROUP BY 1, 2, 3
ORDER BY ai_adoption_pct DESC;Results:
| User Type | Plan | Industry | Total Users | AI Adoption % | Avg Events |
|---|---|---|---|---|---|
| power_user | enterprise | tech | 342 | 78% | 23.4 |
| power_user | pro | education | 891 | 65% | 18.2 |
| regular_user | enterprise | healthcare | 234 | 52% | 12.1 |
Insight: Power users in tech/education are ideal early adopters.
4. Revenue Impact Forecasting
Query: What’s the revenue potential of each feature?
-- Calculate potential revenue from feature adoption
WITH feature_willingness_to_pay AS (
-- From customer interviews and surveys stored in Snowflake
SELECT
feature_name,
customer_segment,
AVG(willingness_to_pay_usd) AS avg_wtp,
COUNT(*) AS respondents
FROM feature_surveys
WHERE survey_date >= DATEADD('month', -2, CURRENT_DATE())
GROUP BY 1, 2
),
addressable_market AS (
SELECT
plan_tier AS customer_segment,
COUNT(DISTINCT user_id) AS total_users,
SUM(mrr) AS current_mrr
FROM users
WHERE status = 'active'
GROUP BY 1
)
SELECT
fwp.feature_name,
fwp.customer_segment,
am.total_users,
fwp.avg_wtp,
-- Conservative estimate: 20% adoption
ROUND(am.total_users * 0.20 * fwp.avg_wtp, 0) AS estimated_annual_revenue
FROM feature_willingness_to_pay fwp
JOIN addressable_market am ON fwp.customer_segment = am.customer_segment
ORDER BY estimated_annual_revenue DESC;Results:
| Feature | Segment | Users | WTP/year | Est. Revenue |
|---|---|---|---|---|
| ai_voiceover | enterprise | 450 | $2,400 | $216,000 |
| ai_content_gen | pro | 2,100 | $600 | $252,000 |
| translation | enterprise | 450 | $1,800 | $162,000 |
Insight: Content generation has highest revenue potential despite fewer requests.
5. Competitive Intelligence
Query: What are competitors launching?
-- Analyze competitor product releases (from web scraping stored in Snowflake)
WITH competitor_features AS (
SELECT
competitor_name,
feature_name,
launch_date,
DATEDIFF('day', launch_date, CURRENT_DATE()) AS days_since_launch,
sentiment_score, -- From social media monitoring
estimated_adoption
FROM competitor_intelligence
WHERE launch_date >= DATEADD('month', -6, CURRENT_DATE())
AND feature_category = 'ai'
)
SELECT
feature_name,
COUNT(DISTINCT competitor_name) AS competitors_with_feature,
AVG(sentiment_score) AS avg_sentiment,
AVG(estimated_adoption) AS avg_adoption_pct,
MIN(launch_date) AS first_launched
FROM competitor_features
GROUP BY 1
ORDER BY competitors_with_feature DESC;Results:
| Feature | Competitors | Sentiment | Adoption | First Launch |
|---|---|---|---|---|
| ai_voiceover | 4 | 0.72 | 28% | 2024-01-15 |
| translation | 5 | 0.68 | 35% | 2023-08-20 |
| ai_content | 2 | 0.81 | 18% | 2024-06-10 |
Insight: AI content generation is newer, less crowded, higher satisfaction.
The Final Roadmap
Combining all data sources, we built a prioritized roadmap:
Priority 1: AI Voiceover
Why:
- #1 customer request (247 requests)
- High enterprise demand (45 enterprise requests)
- 4 competitors already have it (table stakes)
- $216K revenue potential
Decision: Build it, but not differentiated. Keep scope minimal.
Priority 2: Multimodal Content Generation
Why:
- Highest revenue potential ($252K)
- Least competition (only 2 competitors)
- Highest sentiment (0.81)
- Power users want it (tech/education segments)
Decision: This is our differentiator. Invest heavily.
Priority 3: Translation
Why:
- High enterprise demand
- Many competitors (5) = validated market
- Good revenue potential ($162K)
Decision: Build later, partner/integrate initially.
Deprioritized: Video Recording
Why:
- Low request volume (98 requests)
- Technically complex
- Low revenue potential
- Outside core competency
Decision: Don’t build. Focus elsewhere.
Results
6 months post-launch:
| Feature | Predicted Adoption | Actual Adoption | Predicted Revenue | Actual Revenue |
|---|---|---|---|---|
| AI Voiceover | 20% | 22% | $216K | $234K |
| Content Gen | 20% | 35% | $252K | $411K |
Accuracy:
- Adoption forecast: Within 15%
- Revenue forecast: Within 12% for voiceover, 63% upside surprise for content gen
Key win: Avoided building video recording, saved 3 months of engineering time.
Lessons Learned
- Combine qualitative + quantitative: Support tickets + usage data beats either alone
- Competitive intelligence matters: Don’t build what everyone has, differentiate
- Revenue > Request volume: 100 enterprise requests beat 500 free user requests
- Segment your analysis: Power users ≠ occasional users
- Forecast conservatively: We used 20% adoption, reality ranged 22-35%
Snowflake Tips for Product Analytics
Use VARIANT for flexible schemas
-- Store event properties as JSON
CREATE TABLE events (
event_id STRING,
user_id STRING,
event_name STRING,
properties VARIANT, -- Flexible JSON
timestamp TIMESTAMP
);
-- Query nested properties
SELECT
properties:feature::STRING AS feature,
properties:duration_seconds::NUMBER AS duration
FROM events
WHERE event_name = 'feature_used';Leverage data sharing
-- Share data across teams without ETL
CREATE SHARE product_analytics_share;
GRANT USAGE ON DATABASE product_data TO SHARE product_analytics_share;
GRANT SELECT ON TABLE events TO SHARE product_analytics_share;
-- Other teams can query directlyUse materialized views for dashboards
-- Pre-compute expensive queries
CREATE MATERIALIZED VIEW daily_feature_usage AS
SELECT
DATE_TRUNC('day', timestamp) AS date,
event_name,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users
FROM events
GROUP BY 1, 2;
-- Dashboards query this instead of raw events (100x faster)Conclusion
Data-driven product decisions aren’t about replacing intuition—they’re about augmenting it with evidence. Snowflake enabled us to:
- Analyze millions of events in seconds
- Combine 6 different data sources
- Build accurate forecasts
- Make defensible prioritization decisions
The result: A roadmap that delivered 35% adoption and $645K in new revenue in 6 months.
Using data for product decisions? I’d love to discuss analytics strategies and Snowflake use cases. Connect on LinkedIn.