A curated collection of 300+ engineering blog articles from top tech companies. Learn how the best engineering teams solve real-world problems at scale.
Agoda
| Blog | Year | Read |
|---|---|---|
| How Agoda manages 450 million+ property images | 2025 | Link |
| How Agoda built it's property bot to cut response time from 8 hours to seconds | 2024 | Link |
| How Agoda migrated it's GraphQL Monolith API service to Microservices | 2024 | Link |
| How Agoda Solved Retry Storms to Boost System Reliability | 2024 | Link |
| How Agoda Designs and Maintains a High-Performing Data Pipeline | 2023 | Link |
| How Agoda manages 1.8 trillion Events per day on Kafka | 2023 | Link |
| How Agoda indexes hundreds of millions of series in a time-series database | 2022 | Link |
Airbnb
| Blog | Year | Read |
|---|---|---|
| Embedding-Based Retrieval for Airbnb Search | 2025 | Link |
| How Airbnb improved page performance using HTTP Streaming | 2023 | Link |
| Airbnb’s Data Framework for faster and more reliable read-heavy workloads | 2023 | Link |
| Avoiding Double Payments in a Distributed Payments System | 2019 | Link |
Amazon Science
| Blog | Year | Read |
|---|---|---|
| Training code generation models to debug their own outputs | 2025 | Link |
| The technology behind Amazon’s GenAI-powered shopping assistant, Rufus | 2024 | Link |
| Ensuring that customers don't miss out on trending products | 2023 | Link |
| From structured search to learning-to-rank-and-retrieve | 2023 | Link |
| Invalidating robotic ad clicks in real time | 2023 | Link |
| Using large language models (LLMs) to synthesize training data | 2023 | Link |
| Lessons learned from 10 years of DynamoDB | 2022 | Link |
| Using graph neural networks to recommend related products | 2022 | Link |
Atlassian
| Blog | Year | Read |
|---|---|---|
| How Atlassian Scaled and Enhanced Throughput in the Jira Export Service | 2025 | Link |
| How one of Atlassian’s critical services consistently gets above 99.9999% of availability | 2022 | Link |
| How Atlassian made Git push over HTTPS faster for Bitbucket Cloud | 2022 | Link |
| How Atlassian Revamped Confluence Cloud Search | 2021 | Link |
| Caching JQL search in Jira Cloud | 2021 | Link |
| Scaling, rearchitecting, and decomposing Confluence Cloud | 2020 | Link |
| Scaling Bitbucket’s Database | 2020 | Link |
| Atlassian's journey scaling low latency, multi-region services on AWS | 2019 | Link |
Auth0
| Blog | Year | Read |
|---|---|---|
| Build an AI Assistant with LangGraph, Vercel, and Next.js | 2025 | Link |
| Building a Secure RAG with Python, LangChain, and OpenFGA | 2025 | Link |
| Identity Challenges for AI-Powered Applications | 2024 | Link |
Booking.com
| Blog | Year | Read |
|---|---|---|
| Anomaly Detection in Time Series Using Statistical Analysis | 2025 | Link |
| How Booking Cut 20% of the Cloud Cost with a Single Code Change | 2025 | Link |
| The Engineering Behind Booking.com's High-Performance Ranking Platform | 2024 | Link |
| How Booking.com Leverage graph technology for real-time Fraud Detection and Prevention | 2024 | Link |
| How Booking.com Predicts cancellations with survival modeling | 2024 | Link |
Canva
| Blog | Year | Read |
|---|---|---|
| Canva’s continuous data platform | 2025 | Link |
| How Canva's drawing tool works | 2024 | Link |
| How Canva collects 25 billion events per day | 2024 | Link |
| Canva's scalable and reliable content usage counting service | 2024 | Link |
| How Canva saves millions annually in Amazon S3 costs | 2023 | Link |
| How Canva scaled media uploads from Zero to 50 Million per day | 2022 | Link |
| Canva's fast and scalable reverse image search | 2022 | Link |
| How Canva enables real-time collaboration with RSocket | 2021 | Link |
Coinbase
| Blog | Year | Read |
|---|---|---|
| How Coinbase Optimizes Network Requests | 2024 | Link |
| Accelerating Deep Learning Adoption at Coinbase | 2024 | Link |
| Lessons from launching Enterprise-grade GenAI solutions at Coinbase | 2024 | Link |
| How Coinbase Uses ML to Predict Traffic and Scale Databases | 2024 | Link |
| Detecting Fraudulent Transactions at Coinbase | 2023 | Link |
| Building a notification platform at Coinbase | 2022 | Link |
Discord
| Blog | Year | Read |
|---|---|---|
| How Discord Reduced Websocket Traffic by 40% | 2024 | Link |
| How Discord Stores Trillions of Messages | 2023 | Link |
| Pushing Discord’s Limits with a Million+ Online Users in a Single Server | 2023 | Link |
| How Discord uses ML to Build a Delightful Notification Experience | 2022 | Link |
| How Discord Creates Insights from Trillions of Data Points | 2021 | Link |
DoorDash
| Blog | Year | Read |
|---|---|---|
| How DoorDash Uses LLMs to transcribe restaurant menu photos | 2025 | Link |
| How DoorDash leverages LLMs for better search retrieval | 2024 | Link |
| Building DoorDash’s product knowledge graph with large language models | 2024 | Link |
| DoorDash’s in-house search engine | 2024 | Link |
| DoorDash's write-heavy scalable and reliable inventory platform | 2023 | Link |
| Doordash's scalable real time event processing with Kafka and Flink | 2022 | Link |
| DoorDash’s Lessons on Improving Performance on High-Traffic Web Pages | 2022 | Link |
| How DoorDash Applied Client-Side Caching to Improve Feature Store Performance by 70% | 2022 | Link |
| Building a Unified Chat Experience at DoorDash | 2022 | Link |
Dropbox
| Blog | Year | Read |
|---|---|---|
| How Dropbox evolved its infrastructure through the messaging system model | 2025 | Link |
| Dropbox's scalable, consistent, metadata caching solution | 2024 | Link |
| Bringing AI-powered answers and summaries to file previews on the web | 2024 | Link |
| Dropbox's ML-powered file organization | 2023 | Link |
| How Dropbox uses ML to identify date formats in file names | 2023 | Link |
| How Dropbox optimizes payments with machine learning | 2021 | Link |
Ebay
| Blog | Year | Read |
|---|---|---|
| How ebay Exports Billion-Scale Graphs on Transactional Graph Databases | 2023 | Link |
| eBay's Personalized User-Based Ranking Model for Recommendations | 2023 | Link |
| How Multimodal Embeddings Elevate eBay's Product Recommendations | 2023 | Link |
| eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine | 2023 | Link |
| How eBay Handles Real-Time Push Notifications at Scale | 2022 | Link |
| Building a Deep Learning Based Retrieval System for Personalized Recommendations | 2022 | Link |
| How eBay Loads and Updates Over Ten-Billion-Vertex Graphs | 2021 | Link |
| eBay's real-time and performant index service for its large-scale, in-house database platform | 2021 | Link |
Expedia
| Blog | Year | Read |
|---|---|---|
| Inside Expedia’s Migration to ScyllaDB for Change Data Capture | 2024 | Link |
| How Expedia Built its Core Maching Learning Platform | 2024 | Link |
| How Expedia Group ranks website search results | 2024 | Link |
| How Expedia Built a Tool to Query Near Real-Time Streaming Data | 2023 | Link |
| Configuration Management at Expedia Group | 2023 | Link |
| Blog | Year | Read |
|---|---|---|
| Indexing code at scale with Glean - Meta’s open source system | 2024 | Link |
| Inside Facebook’s video delivery system | 2024 | Link |
| Meta's Sequence learning Model for personalized ads recommendations | 2024 | Link |
| How Meta animates AI-generated images at scale | 2024 | Link |
| How Meta trains large language models at scale | 2024 | Link |
| Building Meta’s GenAI Infrastructure | 2024 | Link |
| RoCE networks for distributed AI training at scale | 2024 | Link |
| How Meta built the infrastructure for Threads | 2023 | Link |
| Building end-to-end security for Messenger | 2023 | Link |
| Modernizing Meta’s data platform | 2023 | Link |
| How Precision Time Protocol is being deployed at Meta | 2022 | Link |
| Scaling data ingestion for machine learning training at Meta | 2022 | Link |
| Meta’s cloud gaming infrastructure | 2022 | Link |
| Cache made consistent - How Meta handles cache invalidation | 2022 | Link |
| A highly available, strongly consistent storage service using chain replication | 2022 | Link |
| Making a distributed priority queue disaster-ready | 2022 | Link |
| How we built a general purpose key value store for Facebook with ZippyDB | 2021 | Link |
| Fully Sharded Data Parallel: faster AI training with fewer GPUs | 2021 | Link |
| How Facebook encodes your videos | 2021 | Link |
| Scaling a distributed priority queue at Meta | 2021 | Link |
| How machine learning powers Facebook’s News Feed ranking algorithm | 2021 | Link |
| How Meta scaled Live streaming for millions of viewers simultaneously | 2020 | Link |
Figma
| Blog | Year | Read |
|---|---|---|
| The infrastructure behind AI search in Figma | 2024 | Link |
| Speeding up file load times at Figma | 2024 | Link |
| Figma's LiveGraph: a real-time data system at scale | 2024 | Link |
| How Figma horizontally scaled Postgres to unlock nearly infinit scalability | 2024 | Link |
| How Figma improved performance and load time with incremental frame loading | 2024 | Link |
| How Figma reduced potential instability by scaling to multiple databases | 2023 | Link |
| The hidden challenges of autosave | 2020 | Link |
| Figma's deep search to find the right files even faster | 2020 | Link |
Flipkart
| Blog | Year | Read |
|---|---|---|
| Flipkart's MySQL Highly Available Setup | 2023 | Link |
| Running a multi-region Zookeeper at Flipkart | 2021 | Link |
| Memory Tuning a High Throughput Microservice | 2021 | Link |
| Building Flipkart's Personalized Search Autosuggestion | 2021 | Link |
| Predicting your next query even before you type! | 2021 | Link |
| How Flipkart Adapted Search to Indian Phonetics | 2020 | Link |
GitHub
| Blog | Year | Read |
|---|---|---|
| How we improved push processing on GitHub | 2024 | Link |
| How GitHub uses merge queue to ship hundreds of changes every day | 2024 | Link |
| How GitHub Docs’ new search works | 2023 | Link |
| The technology behind GitHub’s new code search | 2023 | Link |
| Scaling Git’s garbage collection | 2022 | Link |
| Improve Git monorepo performance with a file system monitor | 2022 | Link |
| Partitioning GitHub’s relational databases to handle scale | 2021 | Link |
GoDaddy
| Blog | Year | Read |
|---|---|---|
| How A/B Testing Transformed Product Development at GoDaddy | 2025 | Link |
| AI-Powered Social Media Posts | 2025 | Link |
| Generative AI Domain Search | 2024 | Link |
| How LLMs Are Enhancing GoDaddy’s CMS Experience | 2024 | Link |
| API Gateway at GoDaddy | 2023 | Link |
| Godaddy's Search Data Infrastructure to find domain names | 2022 | Link |
Google Research
| Blog | Year | Read |
|---|---|---|
| Load balancing with random job arrivals | 2025 | Link |
| Transformers in music recommendation | 2024 | Link |
| Scaling multimodal understanding to long videos | 2023 | Link |
| Answering billions of reporting queries each day with low latency | 2023 | Link |
| Grammar checking at Google Search scale | 2023 | Link |
| World scale inverse reinforcement learning in Google Maps | 2023 | Link |
| Resolving code review comments with ML | 2023 | Link |
Grab
| Blog | Year | Read |
|---|---|---|
| Grab AI Gateway: Connecting Grabbers to multiple GenAI providers | 2025 | Link |
| Leveraging RAG-powered LLMs for analytical tasks | 2024 | Link |
| ML Model serving platform at Grab | 2024 | Link |
| LLM-powered data classification for data entities at scale | 2024 | Link |
| Enabling near real-time data analytics on the data lake | 2024 | Link |
| The journey of building a comprehensive attribution platform | 2024 | Link |
| Kafka on Kubernetes: Reloaded for fault tolerance | 2023 | Link |
| Sliding window rate limits in distributed systems | 2023 | Link |
| Road localisation in GrabMaps | 2023 | Link |
| Building hyperlocal GrabMaps | 2023 | Link |
| How Grab stores and processes millions of orders daily | 2022 | Link |
| How Kafka Connect helps move data seamlessly at Grab | 2022 | Link |
| Real-time data ingestion in Grab | 2022 | Link |
| How Grab built a scalable, high-performance ad server | 2022 | Link |
| Using real-world patterns to improve driver-rider matching | 2021 | Link |
| Search indexing optimisation at Grab | 2021 | Link |
| How Grab Built its In-house Chat Platform for the Web | 2020 | Link |
Gusto
| Blog | Year | Read |
|---|---|---|
| API Versioning At Gusto | 2025 | Link |
| How Gusto tackles AI Hallucinations in LLM Apps | 2025 | Link |
| Platform Engineering at Gusto | 2024 | Link |
| How Gusto simplifies large monoliths | 2023 | Link |
HashNode
| Blog | Year | Read |
|---|---|---|
| Hashnode's Feed Architecture | 2023 | Link |
| Hashnode's Overall Architecture | 2023 | Link |
| How Hasnode generates personlized feeds that match users' interest | 2023 | Link |
| Hashnode's Rate Limiting Architecture | 2023 | Link |
| Building an Event-Driven Architecture at Hashnode | 2022 | Link |
| How Hashnode Sends Mass Personalised Emails using AWS Serverless Technologies | 2022 | Link |
| How Hashnode Leverages Serverless for Backing up Posts | 2022 | Link |
| How Hashnode Built Serverless Audio Blogs wiht AWS | 2022 | Link |
Hostinger
| Blog | Year | Read |
|---|---|---|
| How Hostinger Built one of the most advanced LLM-based chat assistants | 2024 | Link |
| How Hostinger Keeps Your Websites Safe | 2024 | Link |
| How Hostinger Deals With DDoS Attacks | 2022 | Link |
Hotstar
| Blog | Year | Read |
|---|---|---|
| Scaling Infrastructure for Millions at Hotstar | 2024 | Link |
| Hotstar’s tale of 10x scale up | 2023 | Link |
| Capturing A Billion Emo(j)i-ons | 2020 | Link |
HubSpot
| Blog | Year | Read |
|---|---|---|
| How Does Hubspot's Prediction Engine Score Millions of CRM Objects Daily | 2024 | Link |
| How HubSpot Upgraded a Thousand MySQL Clusters at Once | 2023 | Link |
| Saving Millions on the storage costs of application logs at HubSpot | 2023 | Link |
| Building a Fast, Thread-safe Hotspot Tracking Library | 2022 | Link |
| Cross Datacenter MySql Data Replication | 2022 | Link |
| Supporting Cross-Region Kafka Messaging | 2022 | Link |
| Improving Database Reliability: Preventing Hotspotting with Client-Side Request Deduplication | 2022 | Link |
| Building a Vitess Balancer to Minimize MySQL Downtime | 2022 | Link |
Instacart
| Blog | Year | Read |
|---|---|---|
| Real-time Fraud Detection with Yoda and ClickHouse | 2024 | Link |
| How Instacart Uses ML to Suggest Replacements for Out-of-Stock Products | 2024 | Link |
| Sequence models for Contextual Recommendations at Instacart | 2024 | Link |
| Supercharging Discovery in Search with LLMs | 2024 | Link |
| Optimizing search relevance at Instacart using hybrid retrieval | 2024 | Link |
| Instacart’s Item Availability Architecture: Solving for scale and consistency | 2023 | Link |
| Instacart's one Deep Learning model for multiple surfaces | 2023 | Link |
| Distributed Machine Learning at Instacart | 2023 | Link |
| How Instacart Uses Embeddings to Improve Search Relevance | 2022 | Link |
| The Journey to Real-Time Machine Learning at Instacart | 2022 | Link |
| How Instacart Uses ML-Driven Autocomplete to Help People Fill Their Carts | 2022 | Link |
| How Instacard optimized its Logistics engine using ML | 2021 | Link |
| A simple search query correction heuristic for the resource-constrained | 2020 | Link |
| Predicting the real-time availability of 200 million grocery items | 2018 | Link |
| How Instacart delivers on time | 2018 | Link |
| Blog | Year | Read |
|---|---|---|
| Scaling the Instagram Explore recommendations system | 2023 | Link |
| Reducing Instagram’s basic video compute time by 94 percent | 2022 | Link |
| Improving Instagram notification management with machine learning and causal inference | 2022 | Link |
| Building text animations for Instagram Stories | 2022 | Link |
| Pushing the limits of compression in Facebook’s mobile apps | 2021 | Link |
| How Instagram suggests new content | 2020 | Link |
| Blog | Year | Read |
|---|---|---|
| Scalable federated learning at LinkedIn | 2025 | Link |
| Building a resilient DNS client for web-scale infrastructure | 2025 | Link |
| Journey of next generation control plane for data systems | 2025 | Link |
| Candidate Generation in a Large Scale Graph Recommendation System | 2024 | Link |
| Accelerating LinkedIn’s My Network tab by reducing latency and improving UX | 2024 | Link |
| Tuning Java for high-performance services | 2024 | Link |
| LinkedIn OpenHouse for Big Data Management | 2023 | Link |
| How LinkedIn Adopted A GraphQL Architecture for Product Development | 2023 | Link |
| How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers | 2023 | Link |
| Building the Infrastructure for Delivering Labor Market Insights from LinkedIn Data | 2023 | Link |
| Upscaling LinkedIn's Profile Datastore While Reducing Costs | 2023 | Link |
| Unifying Messaging Experiences across LinkedIn | 2023 | Link |
| Applying multitask learning to AI models at LinkedIn | 2022 | Link |
| Building a mutable dataset in data lake | 2022 | Link |
| Completing a member knowledge graph with Graph Neural Networks | 2021 | Link |
| Homepage feed multi-task learning using TensorFlow | 2021 | Link |
| Evolving LinkedIn’s analytics tech stack | 2021 | Link |
| Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes | 2021 | Link |
| HTTP/2 in infrastructure: Ambry network stack refactoring | 2021 | Link |
| Building a heterogeneous social network recommendation system | 2020 | Link |
Lyft
| Blog | Year | Read |
|---|---|---|
| From Big Data to Better Data: Ensuring Data Quality with Verity | 2023 | Link |
| Building Real-time Machine Learning Foundations at Lyft | 2023 | Link |
| The Recommendation System at Lyft | 2023 | Link |
| lyft2vec — Embeddings at Lyft | 2023 | Link |
| Powering Millions of Real-Time Decisions with LyftLearn Serving | 2023 | Link |
| Pricing at Lyft | 2022 | Link |
| ML Model Training Infrastructure built on Kubernetes | 2021 | Link |
| Elasticsearch Optimizations at Lyft | 2021 | Link |
| How Lyft discovered OpenStreetMap is the Freshest Map for Rideshare | 2021 | Link |
| Using Client-Side Map Data to Improve Real-Time Positioning | 2021 | Link |
| How Lyft predicts a rider’s destination for better in-app experience | 2020 | Link |
| A New Real-Time Map-Matching Algorithm at Lyft | 2020 | Link |
Medium
| Blog | Year | Read |
|---|---|---|
| Taming Post Claps - The Two Billion Claps Bug | 2024 | Link |
| How Medium uses ScyllaDB to build a fast and scalable data layer | 2024 | Link |
| Building a ChatGPT Plugin for Medium | 2023 | Link |
| Fixing duplicate stories in Medium’s For You feed | 2023 | Link |
| Kubernetes Infrastructure At Medium | 2023 | Link |
| How Medium counts your followers | 2020 | Link |
| Scaling Email Infrastructure for Medium Digest | 2020 | Link |
| Mapping Medium’s Tags using ML | 2018 | Link |
| Microservice Architecture at Medium | 2018 | Link |
Netflix
| Blog | Year | Read |
|---|---|---|
| Foundation Model for Personalized Recommendation | 2025 | Link |
| How Netflix processes billions of impressions daily | 2025 | Link |
| Netflix’s Distributed Counter Abstraction | 2024 | Link |
| Evolving Netflix’s WebSocket proxy for the future | 2024 | Link |
| Netflix’s Key-Value Data Abstraction Layer | 2024 | Link |
| Netflix’s TimeSeries Data Abstraction Layer | 2024 | Link |
| Recommending for Long-Term Member Satisfaction at Netflix | 2024 | Link |
| Maestro: Data/ML Workflow Orchestrator at Netflix | 2024 | Link |
| Reverse Searching Netflix’s Federated Graph | 2024 | Link |
| Supporting Diverse ML Systems at Netflix | 2024 | Link |
| Rebuilding Netflix Video Processing Pipeline with Microservices | 2024 | Link |
| Building In-Video Search | 2023 | Link |
| Streaming SQL in Data Mesh | 2023 | Link |
| Migrating Netflix to GraphQL Safely | 2023 | Link |
| Scaling Media Machine Learning at Netflix | 2023 | Link |
| Building a Media Understanding Platform for ML Innovations | 2023 | Link |
| Finding Cuts with Smooth Visual Transitions Using Machine Learning | 2022 | Link |
| Machine Learning for Fraud Detection in Streaming Services | 2022 | Link |
| Netflix’s High-Throughput, Low-Latency Priority Queueing System | 2022 | Link |
| Rapid Event Notification System at Netflix | 2022 | Link |
| Building Netflix’s Distributed Tracing Infrastructure | 2020 | Link |
Notion
| Blog | Year | Read |
|---|---|---|
| Building and scaling Notion’s data lake | 2024 | Link |
| How we sped up Notion in the browser with WASM SQLite | 2024 | Link |
| The Great Re-shard: adding Postgres capacity (again) with zero downtime | 2023 | Link |
| Creating the Notion API | 2022 | Link |
| The data model behind Notion's flexibility | 2021 | Link |
| Lessons learned from sharding Postgres at Notion | 2021 | Link |
PayPal
| Blog | Year | Read |
|---|---|---|
| Scaling PayPal’s AI Capabilities with PayPal Cosmos.AI Platform | 2024 | Link |
| Scaling Kafka to Support PayPal’s Data Growth | 2023 | Link |
| JunoDB: PayPal’s Key-Value Store | 2023 | Link |
| Scaling Kubernetes to Over 4k Nodes and 200k Pods | 2022 | Link |
| GraphQL at PayPal: An Adoption Story | 2021 | Link |
| How PayPal Uses Real-time Graph Database and Graph Analysis to Fight Fraud | 2021 | Link |
| Next-Gen Data Movement Platform at PayPal | 2021 | Link |
| Deploying Large-scale Fraud Detection Machine Learning Models at PayPal | 2021 | Link |
| Blog | Year | Read |
|---|---|---|
| How Pinterest improved Search Relevance using LLMs | 2025 | Link |
| How Pinterest built it's Text-to-SQL feature | 2024 | Link |
| Change Data Capture at Pinterest | 2024 | Link |
| Real Time Anomaly Detection at Pinterest | 2023 | Link |
| Improving Distributed Caching Performance and Efficiency at Pinterest | 2022 | Link |
| How Pinterest Leverages Realtime User Actions to Boost Homefeed Engagement Volume | 2022 | Link |
| How Pinterest scaled the size of it's ad corpus by 60x | 2021 | Link |
| The machine learning behind delivering relevant ads | 2021 | Link |
Quora
| Blog | Year | Read |
|---|---|---|
| Building Embedding Search at Quora | 2024 | Link |
| Migrating a decade of Redshift usages to Trino at Quora | 2024 | Link |
| Trino at Quora Scale: Cost, Speed, and Reliability | 2023 | Link |
| MySQL sharding at Quora | 2020 | Link |
Razorpay
| Blog | Year | Read |
|---|---|---|
| Razorpay’s Authentication Revamp | 2023 | Link |
| The Making of Razorpay Developer-Console | 2023 | Link |
| How Razorpay Reduced Data Platform Cost by $2M | 2023 | Link |
| Reducing Kubernetes cost by $300,000 at Razorpay | 2023 | Link |
| How does Razorpay Capital Detect Duplicate or Fraud Merchants? | 2023 | Link |
| Razorpay's Real-Time Denormalized Data Streaming Platform | 2023 | Link |
| How Razorpay’s Notification Service Handles Increasing Load | 2022 | Link |
| How Trino and Alluxio power analytics at Razorpay | 2022 | Link |
| Handling Burst Traffic During IPL | 2021 | Link |
| Blog | Year | Read |
|---|---|---|
| Evolving Reddit's Media Infrastructure | 2025 | Link |
| Scaling our Apache Flink powered real-time ad event validation pipeline | 2025 | Link |
| Scaling Reddit’s ad-serving system | 2024 | Link |
| Product Candidate Generation for Reddit Dynamic Product Ads | 2024 | Link |
| Scaling Ads Pacing: from Singleton to Sharded | 2024 | Link |
| Introducing a Global Retrieval Ranking Model in the Ads Funnel | 2024 | Link |
| Building an Experiment-Based Routing Service | 2023 | Link |
| The Reddit Media Metadata Store | 2023 | Link |
Salesforce
| Blog | Year | Read |
|---|---|---|
| Scaling Real-Time Search to 30 Billion Queries with Sub-Second Latency and 0% Downtime | 2025 | Link |
| Scaling Agentic AI Powering 2 Billion Predictions Monthly | 2025 | Link |
| How Agentforce Data Library Powers RAG with 99.99% Uptime | 2025 | Link |
| Secrets for Managing 100,000 Training and Metadata Requests Per Minute | 2024 | Link |
| Inside the Brain of Agentforce | 2024 | Link |
| How Salesforce Supports Millions of Users Seamlessly for GenAI | 2024 | Link |
| nside Salesforce’s Scalable Time Series Forecasting AI Platform | 2024 | Link |
| How Salesforce's Data Cloud Handles 250 Trillion Transactions Weekly | 2024 | Link |
Shopify
| Blog | Year | Read |
|---|---|---|
| How Shopify improved consumer search intent with real-time ML | 2024 | Link |
| Horizontally scaling the Rails backend of Shop app with Vitess | 2024 | Link |
| Improving Shopify App’s Performance | 2024 | Link |
| Building a ShopifyQL Code Editor | 2023 | Link |
| Creating a Flexible Order Routing System with Shopify Functions | 2023 | Link |
| Using Server Sent Events to Simplify Real-time Streaming at Scale | 2022 | Link |
| Capturing Every Change From Shopify’s Sharded Monolith | 2021 | Link |
Slack
| Blog | Year | Read |
|---|---|---|
| How Slack Optimizes its E2E Pipeline | 2025 | Link |
| How Slack built enterprise search to be secure and private | 2025 | Link |
| Advancing Our Chef Infrastructure | 2024 | Link |
| How We Re-Architected Slack for Our Largest Customers | 2024 | Link |
| How Slack autmatically detects stolen session cookies | 2024 | Link |
| How a request flows — from a Slack’s user perspective | 2023 | Link |
| Slack’s Migration to a Cellular Architecture | 2023 | Link |
| Real-time Messaging at Slack | 2023 | Link |
| How Slack traces the flow of notifications across systems | 2023 | Link |
| Slack's Unified end-to-end machine learning infrastructure to generate recommendations | 2023 | Link |
| How We Design Our APIs at Slack | 2021 | Link |
| How we built an eventually-consistent data model to predict Slack Connect invites | 2021 | Link |
| Migrating Millions of Concurrent Websockets to Envoy | 2021 | Link |
| Scaling Datastores at Slack with Vitess | 2020 | Link |
Snap
| Blog | Year | Read |
|---|---|---|
| Bento - Snap's ML Platform | 2025 | Link |
| Snap's Embedding-based Retrieval for its video recommendation system | 2023 | Link |
| How Snap Speed Up Feature Engineering for Recommendation Systems | 2022 | Link |
| How Slack leverages synthetic data to boost the development of ML models | 2022 | Link |
| Training Large-Scale Recommendation Models with TPUs | 2022 | Link |
| Machine Learning for Snapchat Ad Ranking | 2022 | Link |
Spotify
| Blog | Year | Read |
|---|---|---|
| How Spotify Generated Millions of Content Annotations | 2024 | Link |
| Spotify's Data Platform | 2024 | Link |
| The What, Why, and How of Mastering App Size | 2023 | Link |
| How Spotify Automated Content Marketing to Acquire Users at Scale | 2023 | Link |
| How We Built Infrastructure to Run User Forecasts at Spotify | 2022 | Link |
Squarespace
| Blog | Year | Read |
|---|---|---|
| Why We Built a Write Back Cache for Our Asset Library | 2024 | Link |
| Developing Fluid Engine | 2022 | Link |
| How we use WebGL at Squarespace | 2022 | Link |
| A Better Way to Upload Images | 2022 | Link |
Stripe
| Blog | Year | Read |
|---|---|---|
| Stripe’s system for tracking and validating money movement | 2024 | Link |
| How Stripe Processed $1 Trillion in Payments with Zero Downtime | 2023 | Link |
| How Stripe built it's fraud prevention system | 2023 | Link |
| How Stripe builds interactive docs with Markdoc | 2022 | Link |
| Stripe’s payments APIs: The first 10 years | 2020 | Link |
Swiggy
| Blog | Year | Read |
|---|---|---|
| Swiggy's Text-to-SQL Solution | 2024 | Link |
| Optimising the picking process to enable faster deliveries for Instamart | 2024 | Link |
| Improving search relevance in hyperlocal food delivery using (small) language models | 2024 | Link |
| Predicting Food Delivery Time at Cart | 2023 | Link |
| Contextual Bandits for Ads Recommendations | 2022 | Link |
| Using deep learning to detect dissonance between address text and location | 2022 | Link |
| Designing Resilient Microservices at Swiggy | 2021 | Link |
| Designing the Serviceability Platform at Swiggy for High Scale | 2021 | Link |
| A brief introduction to Engineering challenges at Swiggy | 2021 | Link |
| Re-Architecting Swiggy’s logistics systems | 2021 | Link |
| Using Deep Learning for Ranking in Dish Search | 2021 | Link |
| Learning to Predict Two-Wheeler Travel Distance | 2021 | Link |
| Learning To Rank Restaurants | 2021 | Link |
| Running Geo Queries At Scale | 2020 | Link |
| Building Video Stories and Caching | 2020 | Link |
| Deploying deep learning models at scale at Swiggy | 2020 | Link |
| Decoding Food Intelligence at Swiggy | 2020 | Link |
Target
| Blog | Year | Read |
|---|---|---|
| A Deep Dive into Data Replication Mechanisms | 2025 | Link |
| Predictive Modeling for Availability of Inventory | 2024 | Link |
| Contextual Offer Recommendations Engine at Target | 2024 | Link |
| Bundled Product Recommendations at Target | 2024 | Link |
| Target AutoComplete: Real Time Item Recommendations at Target | 2023 | Link |
| Real-Time Personalization Using Microservices | 2023 | Link |
Timescale
| Blog | Year | Read |
|---|---|---|
| Document Loading, Parsing, and Cleaning in AI Applications | 2025 | Link |
| Building a RAG System With Claude, PostgreSQL & Python on AWS | 2025 | Link |
| Automating Data Enrichment in PostgreSQL With OpenAI | 2025 | Link |
| Semantic Search With Ollama and PostgreSQL | 2025 | Link |
| PostgreSQL Indexes for Columnstore | 2025 | Link |
| Handling Billions of Rows in PostgreSQL | 2025 | Link |
| Enhancing Text-to-SQL With Synthetic Summaries | 2025 | Link |
| Scale PostgreSQL via Partitioning | 2024 | Link |
Tinder
| Blog | Year | Read |
|---|---|---|
| Tinder API Style Guide | 2024 | Link |
| Building Obsidian, Tinder’s Design System | 2023 | Link |
| How Tinder built it's API Gateway | 2022 | Link |
| Scaling out Tinder Android Payment Flow using State Machine | 2020 | Link |
Twitch
| Blog | Year | Read |
|---|---|---|
| Ingesting Live Video Streams at Global Scale | 2022 | Link |
| Breaking the Monolith at Twitch | 2022 | Link |
| Using Machine Learning to Review Emotes | 2022 | Link |
| Defense, threat modeling and High Availability at Twitch | 2021 | Link |
Uber
| Blog | Year | Read |
|---|---|---|
| Migrating Uber’s Compute Platform to Kubernetes | 2025 | Link |
| MySQL At Uber | 2025 | Link |
| How Uber Uses Ray® to Optimize the Rides Business | 2025 | Link |
| How Uber Optimizes LLM Training | 2024 | Link |
| Natural Language to SQL Using Gen AI | 2024 | Link |
| Lucene: Uber’s Search Platform | 2024 | Link |
| Uber’s implementation of Live Activity on iOS | 2024 | Link |
| Odin: Uber’s Stateful Platform | 2024 | Link |
| Kafka Tiered Storage at Uber | 2024 | Link |
| Modernizing Logging at Uber with CLP | 2024 | Link |
| How Uber ensures Apache Cassandra®’s tolerance for single-zone failure | 2024 | Link |
| How LedgerStore Supports Trillions of Indexes at Uber | 2024 | Link |
| Balancing HDFS DataNodes in the Uber DataLake | 2024 | Link |
| How Uber Serves Over 40 Million Reads Per Second from Online Storage Using an Integrated Cache | 2024 | Link |
| How Uber Optimized Cassandra Operations At Scale | 2023 | Link |
| How Uber Optimizes the Timing of Push Notifications using ML and Linear Programming | 2022 | Link |
| Deduping and Storing Images at Uber Eats | 2022 | Link |
| Uber’s Next Gen Push Platform on gRPC | 2022 | Link |
| Uber’s Highly Scalable and Distributed Shuffle as a Service | 2022 | Link |
| How Uber Predicts Arrival Times Using Deep Learning | 2022 | Link |
| Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot | 2021 | Link |
Vimeo
| Blog | Year | Read |
|---|---|---|
| Unlocking knowledge sharing for videos with RAG | 2024 | Link |
| A deep dive into Vimeo’s storage strategy for videos | 2023 | Link |
Walmart
| Blog | Year | Read |
|---|---|---|
| Walmart’s Cassandra CDC Solution | 2022 | Link |
| Scaling the Walmart Inventory Reservations API for Peak Traffic | 2022 | Link |
| A Markov Chain Formulation for the Grocery Item Picking Process | 2021 | Link |
| How we rebuilt the Walmart Autocomplete Backend | 2021 | Link |
| Building a Notification Framework for Microservice-based Application | 2021 | Link |
Twitter (X)
| Blog | Year | Read |
|---|---|---|
| Twitter's Recommendation Algorithm | 2023 | Link |
| How we scaled Reads On the Twitter Users Database | 2023 | Link |
| Powering real-time data analytics with Druid at Twitter | 2022 | Link |
| How we built Twitter’s highly reliable ads pacing service | 2021 | Link |
| Storing and retrieving millions of ad impressions per second | 2021 | Link |
| Processing billions of events in real time at Twitter | 2021 | Link |
| Logging at Twitter | 2021 | Link |
| Twitter’s ads serving platform | 2021 | Link |
Yelp
| Blog | Year | Read |
|---|---|---|
| Search Query Understanding with LLMs | 2025 | Link |
| Enhancing Neural Network Training at Yelp | 2025 | Link |
| Boosting ML Pipeline Efficiency | 2024 | Link |
| Yelp’s AI pipeline for inappropriate language detection in reviews | 2024 | Link |
| Rebuilding a Cassandra cluster using Yelp’s Data Pipeline | 2023 | Link |
Zendesk
| Blog | Year | Read |
|---|---|---|
| Improving job execution by ditching the job executor | 2025 | Link |
| Provisioning Kafka topics the easy way | 2024 | Link |
| Moving from DynamoDB to tiered storage with MySQL+S3 | 2023 | Link |
Zillow
| Blog | Year | Read |
|---|---|---|
| Leveraging Knowledge Graphs in Real Estate Search | 2025 | Link |
| The Data Infra Behind Zillow’s 3x Growth in Experiment Volume | 2023 | Link |
| Serving Machine Learning Models Efficiently at Scale at Zillow | 2022 | Link |
| Optimizing Elasticsearch for Low Latency, Real-Time Recommendations | 2022 | Link |
Zomato
| Blog | Year | Read |
|---|---|---|
| Building a cost-effective logging platform for petabyte scale | 2023 | Link |
| How Zomato Handles 100 Million Daily Search Queries | 2023 | Link |
| How Zomato Powers restaurant ads using ML | 2022 | Link |
| How Zomato uses embeddings to identify and cluster unique addresses | 2022 | Link |
| How Zomato predicts your order's Food preparation time | 2022 | Link |
| How Zomato locates its users | 2021 | Link |
| The Deep Tech Behind Estimating Food Preparation Time | 2020 | Link |
