I Built a Rate Limiter in Node.js and Here's What I Got Wrong

A rate limiter is one of those things that looks easy on paper. Count requests per user, block them if they go over, reset after a minute. That’s it.

Then I actually shipped one to production and within a week I’d found three different ways I’d gotten it wrong. Not bugs exactly. More like “this works until it doesn’t, and when it doesn’t, you find out at the worst possible time.”

This is the version I ended up with, the mistakes that got me there, and the stuff I wish someone had told me on day one.

The naive version

Here’s what I wrote first. In-memory map, counts per IP, resets every minute.

const requests = new Map<string, { count: number; resetAt: number }>()

const WINDOW_MS = 60_000
const LIMIT = 100

export function rateLimit(req, res, next) {
  const ip = req.ip
  const now = Date.now()
  const record = requests.get(ip)

  if (!record || now > record.resetAt) {
    requests.set(ip, { count: 1, resetAt: now + WINDOW_MS })
    return next()
  }

  if (record.count >= LIMIT) {
    return res.status(429).json({ error: "Too many requests" })
  }

  record.count++
  next()
}

This worked. For about a day.

Mistake #1: the map never cleans up

You notice this when memory starts climbing. Every new IP adds a record. The records technically “expire” in the sense that their resetAt is in the past, but nothing removes them. So the map grows forever.

If your API gets any meaningful traffic, this is a memory leak with a timer attached. Eventually your process OOMs and you get paged at 2am.

The fix is boring but necessary:

setInterval(() => {
  const now = Date.now()
  for (const [ip, record] of requests) {
    if (now > record.resetAt) {
      requests.delete(ip)
    }
  }
}, 60_000)

I also started thinking about using a Map that auto-evicts old entries. An LRU cache library handles this better than a hand-rolled interval. But even the interval version is fine for a single-process app.

The bigger lesson: in-memory state in a Node app is only fine until it isn’t. And the “isn’t” always shows up in production.

Mistake #2: it didn’t survive a restart

Deployed a new version. Rate limits reset for everybody. Anybody who was being throttled got a clean slate for free.

In isolation that’s not the end of the world. The problem is this: if you’re using a rate limiter for anything beyond “be nice to our servers” — say, preventing brute-force login attempts — then clearing it on deploy is a security hole. An attacker who notices gets infinite retries by hammering during your deploy window.

The fix is to move state out of the process. Redis is the standard answer. You swap your in-memory map for Redis commands:

import Redis from "ioredis"
const redis = new Redis()

const WINDOW_MS = 60_000
const LIMIT = 100

export async function rateLimit(req, res, next) {
  const ip = req.ip
  const key = `rl:${ip}`

  const count = await redis.incr(key)
  if (count === 1) {
    await redis.pexpire(key, WINDOW_MS)
  }

  if (count > LIMIT) {
    return res.status(429).json({ error: "Too many requests" })
  }

  next()
}

INCR is atomic, which matters because your app almost certainly has multiple workers or processes. Two workers hitting an in-memory map at the same time can each see count: 99 and let a request through that should have been blocked. Redis INCR avoids that entirely.

That alone was worth moving off the in-memory version.

Mistake #3: the window was wrong

Fixed windows are easy to reason about and they have a specific, predictable failure. A user can send 100 requests at 11:59:59 and another 100 at 12:00:00. Two hundred requests in one second.

This isn’t theoretical. I watched it happen when we rolled out a new mobile client that synced on the minute. All the clients lined up on clock boundaries. We’d rate-limit them fine within any given minute, and they’d double-fire across boundaries.

The fix is a sliding window. The simple version uses two counters — the current window and the previous one — and computes a weighted count.

export async function slidingWindow(req, res, next) {
  const ip = req.ip
  const now = Date.now()
  const currentWindow = Math.floor(now / WINDOW_MS)
  const prevWindow = currentWindow - 1
  const keyCur = `rl:${ip}:${currentWindow}`
  const keyPrev = `rl:${ip}:${prevWindow}`

  const [curCountStr, prevCountStr] = await redis.mget(keyCur, keyPrev)
  const curCount = parseInt(curCountStr || "0")
  const prevCount = parseInt(prevCountStr || "0")

  const elapsed = now % WINDOW_MS
  const weight = 1 - elapsed / WINDOW_MS
  const estimated = curCount + prevCount * weight

  if (estimated >= LIMIT) {
    return res.status(429).json({ error: "Too many requests" })
  }

  await redis.multi()
    .incr(keyCur)
    .pexpire(keyCur, WINDOW_MS * 2)
    .exec()

  next()
}

Not perfect, but way better than fixed windows. For a proper sliding window, there’s the sorted-set approach where every request is a timestamped entry. More accurate, more Redis operations. I haven’t needed it yet.

Mistake #4: rate-limiting by IP only

This is fine until someone shows up behind a NAT and suddenly a whole office or mobile carrier shares one IP. Or behind Cloudflare, where every request looks like it’s coming from Cloudflare unless you’re reading X-Forwarded-For.

For authenticated endpoints, rate-limit by user ID. For unauthenticated endpoints, rate-limit by IP but accept that you’ll occasionally wrong a big NAT. For endpoints that matter (login, signup), rate-limit by both with different limits.

const key = req.user
  ? `rl:user:${req.user.id}`
  : `rl:ip:${req.ip}`

Small change. Saved me from a support ticket about “our whole office is blocked.”

Also: if you’re behind a proxy, make sure you actually trust the right header. Express has app.set("trust proxy", ...) for this. Get it wrong and every request looks like it’s from 127.0.0.1.

Mistake #5: blanket limits on every endpoint

My first version applied the same limit everywhere. 100 requests a minute. Great for /api/search. Terrible for /api/login where you want way stricter limits, and terrible for /api/autocomplete where users legitimately make a request per keystroke.

The fix is per-route config:

const limits = {
  "/api/login": { limit: 5, window: 60_000 },
  "/api/search": { limit: 100, window: 60_000 },
  "/api/autocomplete": { limit: 300, window: 60_000 },
}

And a middleware factory instead of a single middleware:

export function limitFor(path: string) {
  const config = limits[path]
  return async (req, res, next) => {
    // ... same logic, using config.limit and config.window
  }
}

Obvious in hindsight. Not obvious when you’re first writing it and just want one middleware that works everywhere.

The thing I almost forgot

Tell the client how they’re being rate-limited. Return headers:

res.set("X-RateLimit-Limit", String(LIMIT))
res.set("X-RateLimit-Remaining", String(Math.max(0, LIMIT - count)))
res.set("X-RateLimit-Reset", String(Math.ceil(resetAt / 1000)))

Without these, a legitimate client has no way to back off before getting blocked. With them, they can throttle themselves and never see a 429. Standard headers are worth the one extra line of code.

Should you even write your own?

Honestly, for most apps, no. express-rate-limit with a Redis store handles all of this and it’s tested harder than whatever you’ll write. If you’re in Fastify, @fastify/rate-limit is great. If you’re using a framework that has one, use it.

I wrote my own because I wanted to understand what was happening, and because our setup had some specific needs around keying strategy. Most people don’t. That’s fine.

But if you do write one — or if you’re just auditing the library you’re using — the mistakes above are worth knowing about. Most rate-limiter bugs aren’t bugs in the counting logic. They’re design mistakes about what you’re counting, where you’re storing it, and when it resets.

Rate limiting is a pile of small decisions dressed up as a simple feature. You’ll get the counting logic right in ten minutes. The other 90% is figuring out what to key on, where to store state, what window shape you actually want, and what to return when someone hits the limit. None of that is hard. All of it is easy to get wrong the first time. That’s the whole post.