Rate limiting runs within an application, rather than running on the web server itself. Typically, rate limiting is based on tracking the IP addresses that requests are coming from, and tracking how much time elapses between each request. The IP address is the main way an application identifies who or what is making the request.
A rate limiting solution measures the amount of time between each request from each IP address, and also measures the number of requests within a specified timeframe. If there are too many requests from a single IP within the given timeframe, the rate limiting solution will not fulfill the IP address's requests for a certain amount of time.
Essentially, a rate-limited application will say, "Hey, slow down," to unique users that are making requests at a rapid rate. This is comparable to a police officer who pulls over a driver for exceeding the road's speed limit, or to a parent who tells their child not to eat so much candy in such a short span of time.
Rate limiting is a technique to limit network traffic to prevent users from exhausting system resources. Rate limiting makes it harder for malicious actors to overburden the system and cause attacks like Denial of Service (DoS). This involves attackers flooding a target system with requests and consuming too much network capacity, storage, and memory.
APIs that use rate limiting can throttle or temporarily block any client that tries to make too many API calls. It might slow down a throttled user’s requests for a specified time or deny them altogether. Rate limiting ensures that legitimate requests can reach the system and access information without impacting the overall application’s performance.
This is part of a series of articles about website security.
Rate limiting is a crucial part of a modern cybersecurity strategy. It addresses several attack techniques that affect the incoming request rate.
A DDoS attack attempts to overwhelm the target system with traffic, making it unavailable to legitimate users. Rate limiting mitigates DDoS threats by preventing any given traffic source from sending too many requests.
However, DDoS attacks have unique challenges because they distribute requests among many different sources, sometimes millions of IP addresses. Distributing the attack allows each source to avoid exceeding the rate limit. The security solution should identify the requests from different locations as part of a single attack, treating them as a single source.
When attackers compromise databases containing user credentials, they can use these credentials to carry out further attacks. Usually, a bot stuffs stolen user credentials into a login form until a credential set works, allowing the bot to access the account.
Bots are often highly successful because they can submit hundreds or thousands of credentials into a login form. Rate limiting helps identify credit stuffing and block the bots before they take over the account.
A brute force attack is similar to a credential stuffing attack but without a list of real user credentials. In this case, the bot systematically submits randomly generated credentials until a credential set works.
A strongly secured web application sets password requirements that help mitigate brute force attacks, but large attacks can still consume many network resources. Rate limiting blocks these attacks to save system resources.
Malicious actors often scrape target websites for information they can sell or use to undermine competitors. For example, an attacker might steal pricing information from an eCommerce company. A scraper bot can copy large amounts of data from target applications. Rate limiting detects and blocks data scraping.
An inventory denial or inventory hoarding attack involves sending bots to a target web application, where they start transactions without finishing them. It hoards the inventory, making it unavailable to legitimate users.
Rate limiting works within applications, not in the web server. Rate limiting typically involves tracking the IP addresses where requests originate and identifying the time lapsed between requests. IP addresses are the application’s main way to identify who has made each request.
Rate-limiting solutions work by measuring the elapsed time between every request from a given IP address and tracking the number of requests made in a set timeframe. If one IP address makes too many requests within the specified timeframe, the rate-limiting solution throttles the IP address and doesn’t fulfill its requests for the next timeframe.
Rate-limited applications can tell individual users to slow down if they make requests too frequently. It’s akin to police officers pulling over drivers that exceed the speed limit or parents telling their children not to eat too much sugar in a short period.
Administrators can define different parameters and methods and parameters when setting a rate limit. An organization’s chosen rate-limiting technique depends on the objective and the required level of restriction. Here are three main approaches to rate limiting that an organization might implement:
There are several types of rate-limiting algorithms.
Fixed-window rate limiting algorithms restrict the number of requests allowed during a given timeframe (window). For instance, a server’s rate-limiting component might implement an algorithm that accepts up to 200 API requests per minute. There is a fixed timeframe starting from a specified time—the server will not serve more than 200 requests between 9:00 and 9:01, but the window will reset at 9:01, allowing another 200 requests until 9:02.
Developers can implement a fixed-window algorithm at the server or user level. Implementing the algorithm at the user level will restrict each user to 200 requests per minute. In contrast, a server-level algorithm will restrict the server, meaning that all users combined can make up to 200 requests per minute.
Leaky bucket rate limiting algorithms differ from fixed-window algorithms because they don’t rely on specified timeframes. They focus on the fixed length of request queues without considering the time. The server will service requests on a first-come, first-serve basis. New requests join the back of the queue. The server will drop a new request if it arrives when the queue is full.
Sliding-window rate limiting algorithms are similar to fixed-window algorithms except for the starting point of each time window. With sliding-window rate limiting, the timeframe only starts when a user makes a new request, not a predetermined time. For instance, if the first request arrives at 9:00:24 am (and the rate limit is 200 per minute), the server will allow up to 200 requests until 9:01:24.
Sliding-window algorithms help solve the issues affecting requests in fixed-window rate limiting. They also mitigate the starvation issue facing leaky bucket rate limiting by providing more flexibility.
Imperva Advanced Bot Protection enforces rate limiting for websites, mobile apps and APIs, and can prevent business logic attacks from all access points. Gain seamless visibility and control over bot traffic to stop online fraud through account takeover or competitive price scraping.
Beyond bot protection, Imperva provides comprehensive protection for applications, APIs, and microservices:
Web Application Firewall – Prevent attacks with world-class analysis of web traffic to your applications.
Runtime Application Self-Protection (RASP) – Real-time attack detection and prevention from your application runtime environment goes wherever your applications go. Stop external attacks and injections and reduce your vulnerability backlog.
API Security – Automated API protection ensures your API endpoints are protected as they are published, shielding your applications from exploitation.
DDoS Protection – Block attack traffic at the edge to ensure business continuity with guaranteed uptime and no performance impact. Secure your on premises or cloud-based assets – whether you’re hosted in AWS, Microsoft Azure, or Google Public Cloud.
Attack Analytics – Ensures complete visibility with machine learning and domain expertise across the application security stack to reveal patterns in the noise and detect application attacks, enabling you to isolate and prevent attack campaigns.
Client-Side Protection – Gain visibility and control over third-party JavaScript code to reduce the risk of supply chain fraud, prevent data breaches, and client-side attacks.
Thank you so much for reading. If you found it valuable, consider subscribing for more such content every week. If you have any questions or suggestions, please email me your comments or feel free to improve it.
I'm Rahul, a Indian Sr. Software Engineer (SDE II) and passionate content creator. Sharing my expertise in software development to assist learners.
More about meWhile consistency is vital, it’s essential to understand that achieving strong consistency in distributed systems can come at the expense of increased latency and reduced availability. Strong consistency may require additional coordination mechanisms that slow down operations. Therefore, choosing the appropriate consistency model involves striking a balance between data correctness and system performance, based on the specific requirements of the application and use case. Different systems may opt for eventual consistency or other weaker consistency models if absolute real-time consistency is not necessary for their functionality.
Scalability is a crucial aspect of system design, especially in today’s world of rapidly growing data and user bases. As applications and services become more popular, they must be able to handle increased traffic and data without compromising performance or reliability. In this article, we will explore what scalability is, why it is important, and how to achieve it in system design. The truth is, many of us don’t dive deep enough into scalability to truly grasp its significance in system design. Consequently, we fail to impress interviewers who are looking for candidates with a comprehensive understanding of this crucial aspect.