diff --git a/app/vmauth/README.md b/app/vmauth/README.md
index 693d211dd..c43a78d9a 100644
--- a/app/vmauth/README.md
+++ b/app/vmauth/README.md
@@ -28,7 +28,36 @@ accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.co
 
 ## Load balancing
 
-Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
+Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
+In the latter case `vmauth` balances load among the configured urls in least-loaded round-robin manner.
+`vmauth` retries failing `GET` requests across the configured list of urls.
+This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes
+in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
+
+## Concurrency limiting
+
+`vmauth` limits the number of concurrent requests it can proxy according to the following command-line flags:
+
+- `-maxConcurrentRequests` limits the global number of concurrent requests `vmauth` can serve across all the configured users.
+- `-maxConcurrentPerUserRequests` limits the number of concurrent requests `vmauth` can serve per each configured user.
+
+It is also possible to set individual limits on the number of concurrent requests per each user
+with the `max_concurrent_requests` option - see [auth config example](#auth-config).
+
+`vmauth` responds with `429 Too Many Requests` HTTP error when the number of concurrent requests exceeds the configured limits.
+
+The following [metrics](#monitoring) related to concurrency limits are exposed by `vmauth`:
+
+- `vmauth_concurrent_requests_capacity` - the global limit on the number of concurrent requests `vmauth` can serve.
+  It is set via `-maxConcurrentRequests` command-line flag.
+- `vmauth_concurrent_requests_current` - the current number of concurrent requests `vmauth` processes.
+- `vmauth_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
+  because of the global concurrency limit has been reached.
+- `vmauth_user_concurrent_requests_capacity{username="..."}` - the limit on the number of concurrent requests for the given `username`.
+- `vmauth_user_concurrent_requests_current{username="..."}` - the current number of concurrent requests for the given `username`.
+- `vmauth_user_concurrent_requests_limit_reached_total{username="foo"}` - the number of requests rejected with `429 Too Many Requests` error
+  because of the concurrency limit has been reached for the given `username`.
+
 
 ## Auth config
 
@@ -55,26 +84,27 @@ users:
   headers:
   - "X-Scope-OrgID: foobar"
 
-  # The user for querying local single-node VictoriaMetrics.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be proxied to http://localhost:8428 .
+  # are proxied to http://localhost:8428 .
   # For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
+  #
+  # The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
+  # Excess concurrent requests are rejected with 429 HTTP status code.
+  # See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
 - username: "local-single-node"
   password: "***"
   url_prefix: "http://localhost:8428"
+  max_concurrent_requests: 10
 
-  # The user for querying local single-node VictoriaMetrics with extra_label team=dev.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be routed to http://localhost:8428 with extra_label=team=dev query arg.
+  # are proxied to http://localhost:8428 with extra_label=team=dev query arg.
   # For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
-- username: "local-single-node"
+- username: "local-single-node2"
   password: "***"
   url_prefix: "http://localhost:8428?extra_label=team=dev"
 
-  # The user for querying account 123 in VictoriaMetrics cluster
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
+  # are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
   # For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
   #   - http://vmselect1:8481/select/123/prometheus/api/v1/select
   #   - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -84,10 +114,8 @@ users:
   - "http://vmselect1:8481/select/123/prometheus"
   - "http://vmselect2:8481/select/123/prometheus"
 
-  # The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
+  # are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
   # For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
   #   - http://vminsert1:8480/insert/42/prometheus/api/v1/write
   #   - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -265,7 +293,7 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
   -httpListenAddr.useProxyProtocol
      Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
   -internStringMaxLen int
-     The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
+     The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
   -logInvalidAuthTokens
      Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
   -loggerDisableTimestamps
@@ -284,8 +312,10 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
      Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
   -loggerWarnsPerSecondLimit int
      Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
+  -maxConcurrentPerUserRequests int
+     The maximum number of concurrent requests vmauth can process per each configured user. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option in per-user config (default 300)
   -maxConcurrentRequests int
-     The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxIdleConnsPerBackend (default 1000)
+     The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options (default 1000)
   -maxIdleConnsPerBackend int
      The maximum number of idle connections vmauth can open per each backend host. See also -maxConcurrentRequests (default 100)
   -memory.allowedBytes size
diff --git a/app/vmauth/auth_config.go b/app/vmauth/auth_config.go
index 19dc197e3..a316f2a52 100644
--- a/app/vmauth/auth_config.go
+++ b/app/vmauth/auth_config.go
@@ -13,6 +13,7 @@ import (
 	"sync/atomic"
 
 	"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
+	"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
 	"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
 	"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
 	"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
@@ -32,17 +33,43 @@ type AuthConfig struct {
 
 // UserInfo is user information read from authConfigPath
 type UserInfo struct {
-	Name        string     `yaml:"name,omitempty"`
-	BearerToken string     `yaml:"bearer_token,omitempty"`
-	Username    string     `yaml:"username,omitempty"`
-	Password    string     `yaml:"password,omitempty"`
-	URLPrefix   *URLPrefix `yaml:"url_prefix,omitempty"`
-	URLMaps     []URLMap   `yaml:"url_map,omitempty"`
-	Headers     []Header   `yaml:"headers,omitempty"`
+	Name                  string     `yaml:"name,omitempty"`
+	BearerToken           string     `yaml:"bearer_token,omitempty"`
+	Username              string     `yaml:"username,omitempty"`
+	Password              string     `yaml:"password,omitempty"`
+	URLPrefix             *URLPrefix `yaml:"url_prefix,omitempty"`
+	URLMaps               []URLMap   `yaml:"url_map,omitempty"`
+	Headers               []Header   `yaml:"headers,omitempty"`
+	MaxConcurrentRequests int        `yaml:"max_concurrent_requests,omitempty"`
+
+	concurrencyLimitCh      chan struct{}
+	concurrencyLimitReached *metrics.Counter
 
 	requests *metrics.Counter
 }
 
+func (ui *UserInfo) beginConcurrencyLimit() error {
+	select {
+	case ui.concurrencyLimitCh <- struct{}{}:
+		return nil
+	default:
+		ui.concurrencyLimitReached.Inc()
+		return fmt.Errorf("cannot handle more than %d concurrent requests from user %s", ui.getMaxConcurrentRequests(), ui.name())
+	}
+}
+
+func (ui *UserInfo) endConcurrencyLimit() {
+	<-ui.concurrencyLimitCh
+}
+
+func (ui *UserInfo) getMaxConcurrentRequests() int {
+	mcr := ui.MaxConcurrentRequests
+	if mcr <= 0 || mcr > *maxConcurrentPerUserRequests {
+		mcr = *maxConcurrentPerUserRequests
+	}
+	return mcr
+}
+
 // Header is `Name: Value` http header, which must be added to the proxied request.
 type Header struct {
 	Name  string
@@ -85,14 +112,75 @@ type SrcPath struct {
 
 // URLPrefix represents pased `url_prefix`
 type URLPrefix struct {
-	n    uint32
-	urls []*url.URL
+	n   uint32
+	bus []*backendURL
 }
 
-func (up *URLPrefix) getNextURL() *url.URL {
+type backendURL struct {
+	brokenDeadline     uint64
+	concurrentRequests int32
+	url                *url.URL
+}
+
+func (bu *backendURL) isBroken() bool {
+	ct := fasttime.UnixTimestamp()
+	return ct < atomic.LoadUint64(&bu.brokenDeadline)
+}
+
+func (bu *backendURL) setBroken() {
+	deadline := fasttime.UnixTimestamp() + 3
+	atomic.StoreUint64(&bu.brokenDeadline, deadline)
+}
+
+func (bu *backendURL) put() {
+	atomic.AddInt32(&bu.concurrentRequests, -1)
+}
+
+func (up *URLPrefix) getBackendsCount() int {
+	return len(up.bus)
+}
+
+// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
+//
+// backendURL.put() must be called on the returned backendURL after the request is complete.
+func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
+	bus := up.bus
+	if len(bus) == 1 {
+		// Fast path - return the only backend url.
+		bu := bus[0]
+		atomic.AddInt32(&bu.concurrentRequests, 1)
+		return bu
+	}
+
+	// Slow path - select other backend urls.
 	n := atomic.AddUint32(&up.n, 1)
-	idx := n % uint32(len(up.urls))
-	return up.urls[idx]
+
+	for i := uint32(0); i < uint32(len(bus)); i++ {
+		idx := (n + i) % uint32(len(bus))
+		bu := bus[idx]
+		if bu.isBroken() {
+			continue
+		}
+		if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
+			// Fast path - return the backend with zero concurrently executed requests.
+			return bu
+		}
+	}
+
+	// Slow path - return the backend with the minimum number of concurrently executed requests.
+	buMin := bus[n%uint32(len(bus))]
+	minRequests := atomic.LoadInt32(&buMin.concurrentRequests)
+	for _, bu := range bus {
+		if bu.isBroken() {
+			continue
+		}
+		if n := atomic.LoadInt32(&bu.concurrentRequests); n < minRequests {
+			buMin = bu
+			minRequests = n
+		}
+	}
+	atomic.AddInt32(&buMin.concurrentRequests, 1)
+	return buMin
 }
 
 // UnmarshalYAML unmarshals up from yaml.
@@ -121,31 +209,33 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
 	default:
 		return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
 	}
-	pus := make([]*url.URL, len(urls))
+	bus := make([]*backendURL, len(urls))
 	for i, u := range urls {
 		pu, err := url.Parse(u)
 		if err != nil {
 			return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
 		}
-		pus[i] = pu
+		bus[i] = &backendURL{
+			url: pu,
+		}
 	}
-	up.urls = pus
+	up.bus = bus
 	return nil
 }
 
 // MarshalYAML marshals up to yaml.
 func (up *URLPrefix) MarshalYAML() (interface{}, error) {
 	var b []byte
-	if len(up.urls) == 1 {
-		u := up.urls[0].String()
+	if len(up.bus) == 1 {
+		u := up.bus[0].url.String()
 		b = strconv.AppendQuote(b, u)
 		return string(b), nil
 	}
 	b = append(b, '[')
-	for i, pu := range up.urls {
-		u := pu.String()
+	for i, bu := range up.bus {
+		u := bu.url.String()
 		b = strconv.AppendQuote(b, u)
-		if i+1 < len(up.urls) {
+		if i+1 < len(up.bus) {
 			b = append(b, ',')
 		}
 	}
@@ -298,29 +388,44 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
 		if len(ui.URLMaps) == 0 && ui.URLPrefix == nil {
 			return nil, fmt.Errorf("missing `url_prefix`")
 		}
+		name := ui.name()
 		if ui.BearerToken != "" {
-			name := "bearer_token"
-			if ui.Name != "" {
-				name = ui.Name
-			}
 			if ui.Password != "" {
 				return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
 			}
 			ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
 		}
 		if ui.Username != "" {
-			name := ui.Username
-			if ui.Name != "" {
-				name = ui.Name
-			}
 			ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
 		}
+		mcr := ui.getMaxConcurrentRequests()
+		ui.concurrencyLimitCh = make(chan struct{}, mcr)
+		ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
+		_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
+			return float64(cap(ui.concurrencyLimitCh))
+		})
+		_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
+			return float64(len(ui.concurrencyLimitCh))
+		})
 		byAuthToken[at1] = ui
 		byAuthToken[at2] = ui
 	}
 	return byAuthToken, nil
 }
 
+func (ui *UserInfo) name() string {
+	if ui.Name != "" {
+		return ui.Name
+	}
+	if ui.Username != "" {
+		return ui.Username
+	}
+	if ui.BearerToken != "" {
+		return "bearer_token"
+	}
+	return ""
+}
+
 func getAuthTokens(bearerToken, username, password string) (string, string) {
 	if bearerToken != "" {
 		// Accept the bearerToken as Basic Auth username with empty password
@@ -342,12 +447,12 @@ func getAuthToken(bearerToken, username, password string) string {
 }
 
 func (up *URLPrefix) sanitize() error {
-	for i, pu := range up.urls {
-		puNew, err := sanitizeURLPrefix(pu)
+	for _, bu := range up.bus {
+		puNew, err := sanitizeURLPrefix(bu.url)
 		if err != nil {
 			return err
 		}
-		up.urls[i] = puNew
+		bu.url = puNew
 	}
 	return nil
 }
diff --git a/app/vmauth/auth_config_test.go b/app/vmauth/auth_config_test.go
index d0a2f12ba..6771513ef 100644
--- a/app/vmauth/auth_config_test.go
+++ b/app/vmauth/auth_config_test.go
@@ -218,11 +218,13 @@ users:
 - username: foo
   password: bar
   url_prefix: http://aaa:343/bbb
+  max_concurrent_requests: 5
 `, map[string]*UserInfo{
 		getAuthToken("", "foo", "bar"): {
-			Username:  "foo",
-			Password:  "bar",
-			URLPrefix: mustParseURL("http://aaa:343/bbb"),
+			Username:              "foo",
+			Password:              "bar",
+			URLPrefix:             mustParseURL("http://aaa:343/bbb"),
+			MaxConcurrentRequests: 5,
 		},
 	})
 
@@ -390,15 +392,17 @@ func mustParseURL(u string) *URLPrefix {
 }
 
 func mustParseURLs(us []string) *URLPrefix {
-	pus := make([]*url.URL, len(us))
+	bus := make([]*backendURL, len(us))
 	for i, u := range us {
 		pu, err := url.Parse(u)
 		if err != nil {
 			panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
 		}
-		pus[i] = pu
+		bus[i] = &backendURL{
+			url: pu,
+		}
 	}
 	return &URLPrefix{
-		urls: pus,
+		bus: bus,
 	}
 }
diff --git a/app/vmauth/example_config.yml b/app/vmauth/example_config.yml
index a505c4854..461b42a86 100644
--- a/app/vmauth/example_config.yml
+++ b/app/vmauth/example_config.yml
@@ -18,26 +18,27 @@ users:
   headers:
   - "X-Scope-OrgID: foobar"
 
-  # The user for querying local single-node VictoriaMetrics.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be proxied to http://localhost:8428 .
+  # are proxied to http://localhost:8428 .
   # For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
+  #
+  # The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
+  # Excess concurrent requests are rejected with 429 HTTP status code.
+  # See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
 - username: "local-single-node"
   password: "***"
   url_prefix: "http://localhost:8428"
+  max_concurrent_requests: 10
 
-  # The user for querying local single-node VictoriaMetrics with extra_label team=dev.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be routed to http://localhost:8428 with extra_label=team=dev query arg.
+  # are proxied to http://localhost:8428 with extra_label=team=dev query arg.
   # For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
-- username: "local-single-node"
+- username: "local-single-node2"
   password: "***"
   url_prefix: "http://localhost:8428?extra_label=team=dev"
 
-  # The user for querying account 123 in VictoriaMetrics cluster
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
+  # are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
   # For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
   #   - http://vmselect1:8481/select/123/prometheus/api/v1/select
   #   - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -47,10 +48,8 @@ users:
   - "http://vmselect1:8481/select/123/prometheus"
   - "http://vmselect2:8481/select/123/prometheus"
 
-  # The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
+  # are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
   # For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
   #   - http://vminsert1:8480/insert/42/prometheus/api/v1/write
   #   - http://vminsert2:8480/insert/42/prometheus/api/v1/write
diff --git a/app/vmauth/main.go b/app/vmauth/main.go
index c68d6e8ef..ec180004d 100644
--- a/app/vmauth/main.go
+++ b/app/vmauth/main.go
@@ -33,7 +33,10 @@ var (
 		"See also -maxConcurrentRequests")
 	responseTimeout       = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
 	maxConcurrentRequests = flag.Int("maxConcurrentRequests", 1000, "The maximum number of concurrent requests vmauth can process. Other requests are rejected with "+
-		"'429 Too Many Requests' http status code. See also -maxIdleConnsPerBackend")
+		"'429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
+	maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
+		"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
+		"in per-user config")
 	reloadAuthKey        = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
 	logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
 		`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
@@ -107,32 +110,54 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
 		return true
 	}
 	ui.requests.Inc()
-	targetURL, headers, err := createTargetURL(ui, r.URL)
-	if err != nil {
-		httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
-		return true
-	}
 
 	// Limit the concurrency of requests to backends
 	concurrencyLimitOnce.Do(concurrencyLimitInit)
 	select {
 	case concurrencyLimitCh <- struct{}{}:
-	default:
-		concurrentRequestsLimitReachedTotal.Inc()
-		w.Header().Add("Retry-After", "10")
-		err := &httpserver.ErrorWithStatusCode{
-			Err:        fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh)),
-			StatusCode: http.StatusTooManyRequests,
+		if err := ui.beginConcurrencyLimit(); err != nil {
+			handleConcurrencyLimitError(w, r, err)
+			<-concurrencyLimitCh
+			return true
 		}
-		httpserver.Errorf(w, r, "%s", err)
+	default:
+		concurrentRequestsLimitReached.Inc()
+		err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
+		handleConcurrencyLimitError(w, r, err)
 		return true
 	}
-	processRequest(w, r, targetURL, headers)
+	processRequest(w, r, ui)
+	ui.endConcurrencyLimit()
 	<-concurrencyLimitCh
 	return true
 }
 
-func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) {
+func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
+	u := normalizeURL(r.URL)
+	up, headers, err := ui.getURLPrefixAndHeaders(u)
+	if err != nil {
+		httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
+		return
+	}
+	maxAttempts := up.getBackendsCount()
+	for i := 0; i < maxAttempts; i++ {
+		bu := up.getLeastLoadedBackendURL()
+		targetURL := mergeURLs(bu.url, u)
+		ok := tryProcessingRequest(w, r, targetURL, headers)
+		bu.put()
+		if ok {
+			return
+		}
+		bu.setBroken()
+	}
+	err = &httpserver.ErrorWithStatusCode{
+		Err:        fmt.Errorf("all the backends for the user %q are unavailable", ui.name()),
+		StatusCode: http.StatusServiceUnavailable,
+	}
+	httpserver.Errorf(w, r, "%s", err)
+}
+
+func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) bool {
 	// This code has been copied from net/http/httputil/reverseproxy.go
 	req := sanitizeRequestHeaders(r)
 	req.URL = targetURL
@@ -142,12 +167,20 @@ func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL,
 	transportOnce.Do(transportInit)
 	res, err := transport.RoundTrip(req)
 	if err != nil {
-		err = &httpserver.ErrorWithStatusCode{
-			Err:        fmt.Errorf("error when proxying the request to %q: %s", targetURL, err),
-			StatusCode: http.StatusBadGateway,
+		remoteAddr := httpserver.GetQuotedRemoteAddr(r)
+		requestURI := httpserver.GetRequestURI(r)
+		if r.Method == "POST" || r.Method == "PUT" {
+			// It is impossible to retry POST and PUT requests,
+			// since we already proxied the request body to the backend.
+			err = &httpserver.ErrorWithStatusCode{
+				Err:        fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
+				StatusCode: http.StatusServiceUnavailable,
+			}
+			httpserver.Errorf(w, r, "%s", err)
+			return true
 		}
-		httpserver.Errorf(w, r, "%s", err)
-		return
+		logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying the request to %q: %s", remoteAddr, requestURI, targetURL, err)
+		return false
 	}
 	removeHopHeaders(res.Header)
 	copyHeader(w.Header(), res.Header)
@@ -162,8 +195,9 @@ func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL,
 		remoteAddr := httpserver.GetQuotedRemoteAddr(r)
 		requestURI := httpserver.GetRequestURI(r)
 		logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
-		return
+		return true
 	}
+	return true
 }
 
 var copyBufPool bytesutil.ByteBufferPool
@@ -269,7 +303,7 @@ func concurrencyLimitInit() {
 	})
 }
 
-var concurrentRequestsLimitReachedTotal = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
+var concurrentRequestsLimitReached = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
 
 func usage() {
 	const s = `
@@ -279,3 +313,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
 `
 	flagutil.Usage(s)
 }
+
+func handleConcurrencyLimitError(w http.ResponseWriter, r *http.Request, err error) {
+	w.Header().Add("Retry-After", "10")
+	err = &httpserver.ErrorWithStatusCode{
+		Err:        err,
+		StatusCode: http.StatusTooManyRequests,
+	}
+	httpserver.Errorf(w, r, "%s", err)
+}
diff --git a/app/vmauth/target_url.go b/app/vmauth/target_url.go
index 5f3d0df63..a2849b88e 100644
--- a/app/vmauth/target_url.go
+++ b/app/vmauth/target_url.go
@@ -7,11 +7,6 @@ import (
 	"strings"
 )
 
-func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
-	pu := up.getNextURL()
-	return mergeURLs(pu, requestURI)
-}
-
 func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
 	targetURL := *uiURL
 	targetURL.Path += requestURI.Path
@@ -35,7 +30,22 @@ func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
 	return &targetURL
 }
 
-func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
+func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, []Header, error) {
+	for _, e := range ui.URLMaps {
+		for _, sp := range e.SrcPaths {
+			if sp.match(u.Path) {
+				return e.URLPrefix, e.Headers, nil
+			}
+		}
+	}
+	if ui.URLPrefix != nil {
+		return ui.URLPrefix, ui.Headers, nil
+	}
+	missingRouteRequests.Inc()
+	return nil, nil, fmt.Errorf("missing route for %q", u.String())
+}
+
+func normalizeURL(uOrig *url.URL) *url.URL {
 	u := *uOrig
 	// Prevent from attacks with using `..` in r.URL.Path
 	u.Path = path.Clean(u.Path)
@@ -52,16 +62,5 @@ func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
 		// See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1554
 		u.Path = ""
 	}
-	for _, e := range ui.URLMaps {
-		for _, sp := range e.SrcPaths {
-			if sp.match(u.Path) {
-				return e.URLPrefix.mergeURLs(&u), e.Headers, nil
-			}
-		}
-	}
-	if ui.URLPrefix != nil {
-		return ui.URLPrefix.mergeURLs(&u), ui.Headers, nil
-	}
-	missingRouteRequests.Inc()
-	return nil, nil, fmt.Errorf("missing route for %q", u.String())
+	return &u
 }
diff --git a/app/vmauth/target_url_test.go b/app/vmauth/target_url_test.go
index a942dd8ac..9403aa15d 100644
--- a/app/vmauth/target_url_test.go
+++ b/app/vmauth/target_url_test.go
@@ -13,10 +13,14 @@ func TestCreateTargetURLSuccess(t *testing.T) {
 		if err != nil {
 			t.Fatalf("cannot parse %q: %s", requestURI, err)
 		}
-		target, headers, err := createTargetURL(ui, u)
+		u = normalizeURL(u)
+		up, headers, err := ui.getURLPrefixAndHeaders(u)
 		if err != nil {
 			t.Fatalf("unexpected error: %s", err)
 		}
+		bu := up.getLeastLoadedBackendURL()
+		target := mergeURLs(bu.url, u)
+		bu.put()
 		if target.String() != expectedTarget {
 			t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
 		}
@@ -119,15 +123,16 @@ func TestCreateTargetURLFailure(t *testing.T) {
 		if err != nil {
 			t.Fatalf("cannot parse %q: %s", requestURI, err)
 		}
-		target, headers, err := createTargetURL(ui, u)
+		u = normalizeURL(u)
+		up, headers, err := ui.getURLPrefixAndHeaders(u)
 		if err == nil {
 			t.Fatalf("expecting non-nil error")
 		}
-		if target != nil {
-			t.Fatalf("unexpected target=%q; want empty string", target)
+		if up != nil {
+			t.Fatalf("unexpected non-empty up=%#v", up)
 		}
 		if headers != nil {
-			t.Fatalf("unexpected headers=%q; want empty string", headers)
+			t.Fatalf("unexpected non-empty headers=%q", headers)
 		}
 	}
 	f(&UserInfo{}, "/foo/bar")
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
index 2b61499ce..23ea1ce7b 100644
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -15,6 +15,9 @@ The following tip changes can be tested by building VictoriaMetrics components f
 
 ## tip
 
+* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): add the ability to limit the number of concurrent requests on a per-user basis via `-maxConcurrentPerUserRequests` command-line flag and via `max_concurrent_requests` config option. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346) and [these docs](https://docs.victoriametrics.com/vmauth.html#concurrency-limiting).
+* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): automatically retry failing `GET` requests on all [the configured backends](https://docs.victoriametrics.com/vmauth.html#load-balancing). Previously the backend error has been immediately returned to the client without retrying the request on the remaining backends.
+* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): choose the backend with the minimum number of concurrently executed requests [among the configured backends](https://docs.victoriametrics.com/vmauth.html#load-balancing) in a round-robin manner for serving the incoming requests. This allows spreading the load among backends more evenly, while improving the response time.
 * FEATURE: [vmalert enterprise](https://docs.victoriametrics.com/vmalert.html): add ability to read alerting and recording rules from S3, GCS or S3-compatible object storage. See [these docs](https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage).
 
 ## [v1.87.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.87.1)
diff --git a/docs/Cluster-VictoriaMetrics.md b/docs/Cluster-VictoriaMetrics.md
index 5f147ff60..fc0177733 100644
--- a/docs/Cluster-VictoriaMetrics.md
+++ b/docs/Cluster-VictoriaMetrics.md
@@ -52,7 +52,8 @@ Some facts about tenants in VictoriaMetrics:
 - Each `accountID` and `projectID` is identified by an arbitrary 32-bit integer in the range `[0 .. 2^32)`.
 If `projectID` is missing, then it is automatically assigned to `0`. It is expected that other information about tenants
 such as auth tokens, tenant names, limits, accounting, etc. is stored in a separate relational database. This database must be managed
-by a separate service sitting in front of VictoriaMetrics cluster such as [vmauth](https://docs.victoriametrics.com/vmauth.html) or [vmgateway](https://docs.victoriametrics.com/vmgateway.html). [Contact us](mailto:info@victoriametrics.com) if you need assistance with such service.
+by a separate service sitting in front of VictoriaMetrics cluster such as [vmauth](https://docs.victoriametrics.com/vmauth.html)
+or [vmgateway](https://docs.victoriametrics.com/vmgateway.html). [Contact us](mailto:info@victoriametrics.com) if you need assistance with such service.
 
 - Tenants are automatically created when the first data point is written into the given tenant.
 
@@ -172,7 +173,8 @@ It is recommended to run at least two nodes for each service for high availabili
 
 It is preferred to run many small `vmstorage` nodes over a few big `vmstorage` nodes, since this reduces the workload increase on the remaining `vmstorage` nodes when some of `vmstorage` nodes become temporarily unavailable.
 
-An http load balancer such as [vmauth](https://docs.victoriametrics.com/vmauth.html) or `nginx` must be put in front of `vminsert` and `vmselect` nodes. It must contain the following routing configs according to [the url format](#url-format):
+An http load balancer such as [vmauth](https://docs.victoriametrics.com/vmauth.html) or `nginx` must be put in front of `vminsert` and `vmselect` nodes.
+It must contain the following routing configs according to [the url format](#url-format):
 
 - requests starting with `/insert` must be routed to port `8480` on `vminsert` nodes.
 - requests starting with `/select` must be routed to port `8481` on `vmselect` nodes.
@@ -475,7 +477,8 @@ if some of its components are temporarily unavailable.
 
 VictoriaMetrics cluster remains available if the following conditions are met:
 
-- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
+- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes
+  ([vmauth](https://docs.victoriametrics.com/vmauth.html) stops routing requests to unavailable nodes).
 
 - At least a single `vminsert` node must remain available in the cluster for processing data ingestion workload.
   The remaining active `vminsert` nodes must have enough compute capacity (CPU, RAM, network bandwidth)
diff --git a/docs/vmauth.md b/docs/vmauth.md
index 0d5dfbefa..5ce320aaf 100644
--- a/docs/vmauth.md
+++ b/docs/vmauth.md
@@ -32,7 +32,36 @@ accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.co
 
 ## Load balancing
 
-Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
+Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
+In the latter case `vmauth` balances load among the configured urls in least-loaded round-robin manner.
+`vmauth` retries failing `GET` requests across the configured list of urls.
+This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes
+in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
+
+## Concurrency limiting
+
+`vmauth` limits the number of concurrent requests it can proxy according to the following command-line flags:
+
+- `-maxConcurrentRequests` limits the global number of concurrent requests `vmauth` can serve across all the configured users.
+- `-maxConcurrentPerUserRequests` limits the number of concurrent requests `vmauth` can serve per each configured user.
+
+It is also possible to set individual limits on the number of concurrent requests per each user
+with the `max_concurrent_requests` option - see [auth config example](#auth-config).
+
+`vmauth` responds with `429 Too Many Requests` HTTP error when the number of concurrent requests exceeds the configured limits.
+
+The following [metrics](#monitoring) related to concurrency limits are exposed by `vmauth`:
+
+- `vmauth_concurrent_requests_capacity` - the global limit on the number of concurrent requests `vmauth` can serve.
+  It is set via `-maxConcurrentRequests` command-line flag.
+- `vmauth_concurrent_requests_current` - the current number of concurrent requests `vmauth` processes.
+- `vmauth_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
+  because of the global concurrency limit has been reached.
+- `vmauth_user_concurrent_requests_capacity{username="..."}` - the limit on the number of concurrent requests for the given `username`.
+- `vmauth_user_concurrent_requests_current{username="..."}` - the current number of concurrent requests for the given `username`.
+- `vmauth_user_concurrent_requests_limit_reached_total{username="foo"}` - the number of requests rejected with `429 Too Many Requests` error
+  because of the concurrency limit has been reached for the given `username`.
+
 
 ## Auth config
 
@@ -59,26 +88,27 @@ users:
   headers:
   - "X-Scope-OrgID: foobar"
 
-  # The user for querying local single-node VictoriaMetrics.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be proxied to http://localhost:8428 .
+  # are proxied to http://localhost:8428 .
   # For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
+  #
+  # The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
+  # Excess concurrent requests are rejected with 429 HTTP status code.
+  # See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
 - username: "local-single-node"
   password: "***"
   url_prefix: "http://localhost:8428"
+  max_concurrent_requests: 10
 
-  # The user for querying local single-node VictoriaMetrics with extra_label team=dev.
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be routed to http://localhost:8428 with extra_label=team=dev query arg.
+  # are proxied to http://localhost:8428 with extra_label=team=dev query arg.
   # For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
-- username: "local-single-node"
+- username: "local-single-node2"
   password: "***"
   url_prefix: "http://localhost:8428?extra_label=team=dev"
 
-  # The user for querying account 123 in VictoriaMetrics cluster
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
+  # are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
   # For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
   #   - http://vmselect1:8481/select/123/prometheus/api/v1/select
   #   - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -88,10 +118,8 @@ users:
   - "http://vmselect1:8481/select/123/prometheus"
   - "http://vmselect2:8481/select/123/prometheus"
 
-  # The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
-  # See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
   # All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
-  # will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
+  # are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
   # For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
   #   - http://vminsert1:8480/insert/42/prometheus/api/v1/write
   #   - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -269,7 +297,7 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
   -httpListenAddr.useProxyProtocol
      Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
   -internStringMaxLen int
-     The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
+     The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
   -logInvalidAuthTokens
      Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
   -loggerDisableTimestamps
@@ -288,8 +316,10 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
      Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
   -loggerWarnsPerSecondLimit int
      Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
+  -maxConcurrentPerUserRequests int
+     The maximum number of concurrent requests vmauth can process per each configured user. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option in per-user config (default 300)
   -maxConcurrentRequests int
-     The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxIdleConnsPerBackend (default 1000)
+     The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options (default 1000)
   -maxIdleConnsPerBackend int
      The maximum number of idle connections vmauth can open per each backend host. See also -maxConcurrentRequests (default 100)
   -memory.allowedBytes size