Node.js Best Practices

Quality patterns for Node.js services in production: observability, security, reliability, and operational readiness. Complements the nodejs skill (which covers runtime and module syntax).

When to Use

Reviewing a Node.js service for production readiness
Evaluating logging and observability strategy
Auditing security posture (secrets, headers, rate limiting)
Designing startup/shutdown lifecycle

Don’t use for:

Express route design (use express)
npm dependency management (use nodejs)
TypeScript configuration (use typescript)

Critical Patterns

✅ REQUIRED [CRITICAL]: Structured Logging

JSON logs with consistent fields. Never console.log in production services.

// ❌ WRONG — unstructured, unsearchable
console.log('User created:', user.id);

// ✅ CORRECT — structured, queryable
logger.info({ userId: user.id, email: user.email }, 'User created');
// Output: {"level":"info","userId":"123","email":"...","msg":"User created","time":...}

Use pino or winston. Required fields: level, msg, time, requestId (from context).

✅ REQUIRED [CRITICAL]: Graceful Shutdown

Drain in-flight requests and close connections before exiting on SIGTERM/SIGINT.

// ❌ WRONG — kills in-flight requests and open DB connections
process.exit(0);

// ✅ CORRECT — drain then exit
process.on('SIGTERM', async () => {
  await server.close();      // stop accepting new connections
  await db.pool.end();       // close DB pool
  process.exit(0);
});

❌ NEVER: Secrets Without Startup Validation

Validate all required environment variables at boot. Fail fast with a clear error.

// ❌ WRONG — missing env var surfaces at runtime during a user request
const conn = await db.connect(process.env.DATABASE_URL);

// ✅ CORRECT — fail at startup with actionable message
const env = z.object({
  DATABASE_URL: z.string().url(),
  JWT_SECRET: z.string().min(32),
  PORT: z.coerce.number().default(3000),
}).parse(process.env);

✅ REQUIRED: Security Headers

Ship security headers on every response. Use your framework’s security middleware or set manually.

// Express / Fastify
import helmet from 'helmet';
app.use(helmet());

// Hono
import { secureHeaders } from 'hono/secure-headers';
app.use(secureHeaders());

// Native http / any framework — set manually
res.setHeader('X-Content-Type-Options', 'nosniff');
res.setHeader('X-Frame-Options', 'DENY');
res.setHeader('Strict-Transport-Security', 'max-age=31536000; includeSubDomains');

Minimum required headers: Content-Security-Policy, Strict-Transport-Security, X-Content-Type-Options, X-Frame-Options.

✅ REQUIRED: Error Propagation — Never Swallow

Every async path must handle rejections. Unhandled rejections crash the process in Node 15+.

// ❌ WRONG — silent failure, no error surfaced
asyncOperation();

// ❌ WRONG — caught but swallowed
asyncOperation().catch(() => {});

// ✅ CORRECT — propagate to caller or log + respond
asyncOperation().catch((err) => {
  logger.error({ err }, 'Operation failed');
  res.status(500).json({ error: 'Internal server error' });
});

❌ NEVER: Synchronous I/O in Request Path

fs.readFileSync, crypto.pbkdf2Sync, large JSON.parse — all block the event loop.

// ❌ WRONG — blocks event loop for all concurrent requests
const config = fs.readFileSync('./config.json', 'utf8');

// ✅ CORRECT — async, non-blocking
const config = await fs.promises.readFile('./config.json', 'utf8');

✅ REQUIRED: Cluster or Worker Threads for CPU-Bound Work

Single Node.js process uses one CPU core. Multi-core machines need clustering.

// ✅ CORRECT — use all cores (or PM2 cluster mode in production)
import cluster from 'node:cluster';
import os from 'node:os';

if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) cluster.fork();
} else {
  startServer();
}

For CPU-bound work in a single process (image processing, crypto), use Worker Threads instead of blocking the main thread.

Symptom → Solution

Symptom	Cause	Fix
Server crashes unexpectedly	Unhandled promise rejection	Add `process.on('unhandledRejection')` + fix root cause
Logs not searchable in Datadog/Kibana	`console.log` plain strings	Switch to structured JSON logger (pino/winston)
Slow responses under concurrent load	Sync I/O in request path	Audit with `clinic.js`; replace with async equivalents
Secrets missing at runtime	No startup env validation	Validate with zod/joi at boot; fail fast
Requests lost during deployment	No graceful shutdown	Add SIGTERM handler with drain + close
Security scanner flags missing headers	No helmet	Add `app.use(helmet())`

Decision Tree

Service logging to console.log?
  → Replace with structured JSON logger (pino recommended)
  → Add requestId to all log entries via async context

SIGTERM/SIGINT handled?
  → No → Add graceful shutdown: stop server, drain, close DB, exit

Environment variables read without validation?
  → Add zod/joi schema validation at process start
  → Fail fast with "Missing required env: DATABASE_URL" message

Security headers configured?
  → Express/Fastify → helmet · Hono → hono/secure-headers · native http → res.setHeader manually

Async call without .catch or try/catch?
  → Unhandled rejection risk — add error handling
  → In Express: use express-async-errors or wrap routes

Synchronous I/O in request handler?
  → Identify with --prof or clinic.js
  → Replace with async equivalent

Service on multi-core machine?
  → Using PM2? → pm2 start app.js -i max
  → Manual? → node:cluster with os.cpus().length workers

CPU-bound work in request handler?
  → Offload to Worker Threads to avoid blocking event loop

Example

import Fastify from 'fastify';
import { z } from 'zod';
import pino from 'pino';

// ✅ Startup env validation — fail fast
const env = z.object({
  DATABASE_URL: z.string().url(),
  PORT: z.coerce.number().default(3000),
}).parse(process.env);

const logger = pino({ level: 'info' });
const app = Fastify({ logger });

// ✅ Graceful shutdown
const shutdown = async () => {
  await app.close();
  logger.info('Server closed');
  process.exit(0);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

app.listen({ port: env.PORT });

Edge Cases

Containerized deployments: SIGTERM is sent by orchestrators (Kubernetes, Docker). Set terminationGracePeriodSeconds to at least 30s to allow in-flight requests to drain.

Clustering with stateful connections: WebSockets and sticky sessions don’t work with round-robin clustering. Use Redis pub/sub or sticky load balancing at the proxy level.

Long-running background jobs: Don’t run long jobs in the web process — use a separate worker process (BullMQ, pg-boss) so the web server remains responsive.

Memory leaks in long-running processes: Monitor heap usage with process.memoryUsage(). Common causes: unbounded caches, event listener accumulation, closure capturing large objects.