HAProxy 2.0 and Beyond

25-03-20 10

在这篇文章中，我们将带领您了解HAProxy2.0andBeyond的全貌，同时，我们还将为您介绍有关ApacheHadoopYARN:MovingbeyondMapReduceandBa、CENTO

在这篇文章中，我们将带领您了解HAProxy 2.0 and Beyond的全貌，同时，我们还将为您介绍有关Apache Hadoop YARN: Moving beyond MapReduce and Ba、CENTOS7 高性能Linux集群通过yum进行 haproxy配置！安装！使用！HAProxy配置文件详解、com.sun.proxy.$Proxy4 cannot be cast异常、Compare with Haproxy and Nginx_tcp_proxy_module的知识，以帮助您更好地理解这个主题。

本文目录一览：

HAProxy 2.0 and Beyond
Apache Hadoop YARN: Moving beyond MapReduce and Ba
CENTOS7 高性能Linux集群通过yum进行 haproxy配置！安装！使用！HAProxy配置文件详解
com.sun.proxy.$Proxy4 cannot be cast异常
Compare with Haproxy and Nginx_tcp_proxy_module

HAProxy 2.0 and Beyond

转自：https://www.haproxy.com/blog/haproxy-2-0-and-beyond/ 关于 haproxy 2.0 的新特性说明

HAProxy Technologies is excited to announce the release of HAProxy 2.0, bringing features critical for cloud-native and containerized environments, while retaining its industry-leading performance and reliability.

HAProxy 2.0 adds a powerful set of core features as well as completely new functionality that further improves its seamless support for integration into modern architectures. This includes Layer 7 retries, Prometheus metrics, traffic shadowing, polyglot extensibility, and gRPC support. In conjunction with this release we are also introducing the HAProxy Kubernetes Ingress Controller and the powerful HAProxy Data Plane API which provides a modern REST API for configuring and managing HAProxy. Read the release announcement here.

When HAProxy 1.8 was released in November 2017, it introduced features including Hitless Reloads, DNS Service Discovery, Dynamic Scaling with the Runtime API, and HTTP/2 at the edge. These advancements moved HAProxy along the path of supporting a variety of architectures at any scale and in any environment, while also allowing it to maintain its position as the world’s fastest software load balancer.

Since then, many important changes have happened within the core project itself, such as changing the release cadence from an annual to a biannual release cycle. The project has opened up issue submissions on its HAProxy GitHub account. This has allowed our community to continue to flourish and we’re excited to be a part of such a strong corps of contributors.

The HAProxy community provides code submissions covering new functionality and bug fixes, quality assurance testing, continuous integration environments, bug reports, and much more. Everyone has done their part to make this release possible! If you’d like to join this amazing community, you can find it on Slack, Discourse, and the HAProxy mailing list.

This release improves upon capabilities that fit the unique conditions of cloud and container environments. HAProxy 2.0 is an LTS release.

In addition, the inaugural community conference, HAProxyConf, will take place in Amsterdam, Netherlands on November 12-13, 2019. With many interesting talk suggestions already received, we are looking at an amazing conference and we hope to see you there!

We’ve put together a complete HAProxy 2.0 configuration, which allows you to follow along and get started with the latest features right away. We’ll also be hosting webinars to cover the HAProxy 2.0 release, the Data Plane API, and the Kubernetes Ingress Controller. Sign up here.

In this post, we’ll give you an overview of the following updates included in this release:

Cloud-Native Threading & Logging
HTTP Representation (HTX)
End-to-End HTTP/2
gRPC
Layer 7 Retries
Data Plane API
Process Manager
Polyglot Extensibility
Traffic Shadowing
Kubernetes Ingress Controller
Prometheus Exporter
Peers & Stick Tables Improvements
Power of Two Random Choices Algorithm
Log Distribution & Sampling
Built-in Automatic Profiling
Enhanced TCP Fast Open
New Request Actions
New Converters
New Fetches
Miscellaneous Improvements
LTS Support for 1.9 Features

Cloud-Native Threading & Logging

Tuning HAProxy for optimal performance is now even easier. Since version 1.8, you’ve been able to set the nbthread directive to instruct HAProxy to operate across multiple threads, which allows you to make better use of multi-core processor machines. HAProxy now automatically configures this for you. It will, out of the box, set the number of worker threads to match the machine’s number of available CPU cores. That means that HAProxy can scale to accommodate any environment with less manual configuration.

You can still configure this yourself with the nbthread directive, but this makes the task simpler. It also removes the burden of tuning this in cloud environments where machine instance sizes may be heterogeneous. On systems where HAProxy cannot retrieve the CPU affinity information it will default to a single thread.

This also simplifies the bind line as it no longer requires you to specify a processsetting. Connections will be distributed to threads with the fewest active connections. Also, two new build parameters have been added: MAX_THREADS and MAX_PROCS, which avoid needlessly allocating huge structs. This can be very helpful on embedded devices that do not need to support MAX_THREADS=64.

Logging is now easier to adapt for containerized environments. You can log directly to stdout and stderr, or to a file descriptor. Use the following syntax:

HTTP Representation (HTX)

The Native HTTP Representation (HTX) was introduced with HAProxy 1.9 and it laid the foundation that will allow HAProxy to continue to provide best-in-class performance while accelerating cutting-edge feature delivery for modern environments. Many of the latest features, such as end-to-end HTTP/2, gRPC, and Layer 7 retries, are powered by HTX.

HTX creates an internal, native representation of the HTTP protocol(s). It creates strongly typed, well-delineated header fields and allows for gaps and out-of-order fields. Modifying headers now consists simply of marking the old one as deleted and appending the new one to the end. This provides easy manipulation of any representation of the HTTP protocol, allows HAProxy to maintain consistent semantics from end-to-end, and provides higher performance when translating HTTP/2 to HTTP/1.1 or vice versa.

With HTX in place, any future HTTP protocols will be easier to integrate. It has matured since its introduction and starting in 2.0 it will be enabled by default.

End-to-End HTTP/2

With HTX now being on by default, HAProxy officially supports end-to-end HTTP/2. Here’s an example of how to configure it with TLS offloading. The bind and server lines include the alpn parameter, which specifies a list of protocols that can be used in preferred order:

You can also use HTTP/2 without TLS. Remove any ssl and verify parameters from the server and/or bind lines. Then swap alpn h2 for proto h2 and HAProxy will use only the given protocol.

gRPC

HAProxy 2.0 delivers full support for the open-source RPC framework, gRPC. It allows for bidirectional streaming of data, detection of gRPC messages, and logging gRPC traffic. The gRPC protocol is a modern, high-performance RPC framework that can run in any environment. Using Protocol Buffers, it’s able to serialize messages into a binary format that’s compact and potentially more efficient than JSON.

To begin using gRPC in HAProxy, you just need to set up a standard end-to-end HTTP/2 configuration. Here, we’re using the alpn parameter to enable HTTP/2 over TLS:

Standard ACLs apply and allow for path-based matching, as shown:

Additionally, two new converters, protobuf and ungrpc, have been introduced that let you extract the raw Protocol Buffers messages.

Layer 7 Retries

Reducing downtime often involves having smart contingency mechanisms in place. HAProxy has, since its inception, supported retrying a failed TCP connection by including the option redispatch directive. With HAProxy 2.0, it can also retry from another server at Layer 7 for failed HTTP requests. The new configuration directive, retry-on, can be used in a defaults, listen, or backend section. The number of attempts at retrying can be specified using the retries directive. It is important that you know how your application behaves with Layer 7 retries enabled. Caution must be exercised when retrying requests such as POST requests. In our examples we have disabled POST requests from being retried using http-request disable-l7-retry if METH_POST

It supports a variety of error types to allow for granular control. Otherwise, you can specify all-retryable-errors, which will retry the request for any error that is considered to be retryable. The full list of retry-on options is below:

Option	What it means
none	Never retry.
conn-failure	Retry when the connection or the TLS handshake failed. This is the default.
empty-response	Retry when the server connection was closed after part of the request was sent and nothing was received from the server. This type of failure may be caused by the request timeout on the server side, poor network conditions, or a server crash or restart while processing the request.
junk-response	Retry when the server returned something not looking like a complete HTTP response. This includes partial response headers as well as non-HTTP contents. It is usually a bad idea to retry on such events, which may be caused by a configuration issue such as having the wrong server port or by the request being rejected because it is potentially harmful to the server (a buffer overflow attack, for example).
response-timeout	The server timeout struck while waiting for the server to respond. This may be caused by poor network conditions, the reuse of an idle connection that has expired, or by the request being extremely expensive to process. It is generally a bad idea to retry on such events on servers dealing with heavy database processing (full scans, etc.) as it may amplify denial-of-service attacks.
0rtt-rejected	Retry requests that were sent over TLS Early Data (0-RTT) and were rejected by the server. These requests are generally considered to be safe to retry.
<status>	Retry on any HTTP status code among 404 (Not Found), 408 (Request Timeout), 425 (Too Early), 500 (Server Error), 501 (Not Implemented), 502 (Bad Gateway), 503 (Service Unavailable), 504 (Gateway Timeout).
all-retryable-errors	Retry for any error that is considered retryable. This is the same as if you specified conn-failure, empty-response, junk-response, response-timeout, 0rtt-rejected, 500, 502, 503, and 504.

HAProxy 2.0 also introduces a new http-request action called disable-l7-retry that allows you to disable any attempt to retry the request if it fails for any reason other than a connection failure. This can be useful, for example, to make sure that POST requests aren’t retried.

Here’s an example configuration that activates Layer 7 retries:

Data Plane API

In today’s cloud-native landscape, ephemeral services are born and die quickly, deployments happen continuously, and configuration needs to be refreshed constantly. The new Data Plane API provides a modern REST API for configuring HAProxy on the fly. You can now dynamically add and remove frontends, backends, and servers. You can create ACL rules, insert HTTP routing directives, set IP and port bindings, and much more. The API updates the configuration file as needed, reloading the HAProxy process when necessary.

HAProxy has proven itself to be dynamic and extensible with its built-in Lua support and its Stream Processing Offload Engine. The new Data Plan API takes that one step forward by providing true dynamic configuration management. The API daemon runs as a sidecar process, which HAProxy can manage using the program directive in the new Process Manager. The HAProxy Data Plane API supports transactions, which allow multiple changes to be applied simultaneously. This gives you the ultimate confidence that updates are atomic.

GitHub: https://github.com/haproxytech/dataplaneapi
Documentation: https://www.haproxy.com/documentation/hapee/1-9r1/configuration/dataplaneapi/
API Specification: https://www.haproxy.com/documentation/dataplaneapi/latest/

Process Manager

Several of the exciting innovations happening involve components that run as sidecar processes alongside HAProxy, such as the Data Plane API and any Stream Processing Offload Agents (SPOAs). Clearly, there’s a benefit to having central orchestration to control the lifecycle of these processes.

This release introduces support for the new Process Manager. It allows you to specify external binaries that HAProxy will start and manage directly under its master/worker mode. After enabling master/worker mode by either including the -Ws flag on the command line or adding a master-worker directive to the global section of the HAProxy configuration, you can tell HAProxy to start external programs by using the following syntax:

For example, to have HAProxy handle the start up of the Data Plane API, you would add it as a command in the program section, like this:

You can view a list of running commands by issuing a show proc command to the Runtime API:

Polyglot Extensibility

The Stream Processing Offload Engine (SPOE) and Stream Processing Offload Protocol(SPOP) were introduced in HAProxy 1.7. The goal was to create the extension points necessary to build upon HAProxy using any programming language. The initial examples were all C based. Over time, the community saw a need to show how SPOE can be extended in any language and a variety of libraries and examples were contributed. This opens the door to as many developers as possible.

In collaboration with our community, we’re excited to announce that libraries and examples are available in the following languages and platforms:

C
.NET Core
Golang
Lua
Python

Traffic Shadowing

Traffic shadowing, or mirroring, allows you to mirror requests from one environment to another. This is helpful in instances where you would like to send a percentage of production traffic to a testing or staging environment to vet a release before it’s fully deployed. The new Traffic Shadowing daemon is written as a Stream Processing Offload Agent (SPOA) and takes advantage of HAProxy’s SPOE, which allows you to extend HAProxy using any programming language.

The Traffic Shadowing SPOA can be launched and managed using the Process Manager, as shown:

Above, we specified config mirror.cfg on the filter spoe line. Here is an example of how mirror.cfg would look:

Kubernetes Ingress Controller

Since February 2017, an HAProxy Ingress Controller for Kubernetes has been provided by community contributor, Joao Morais. HAProxy Technologies contributed features, such as adding DNS service discovery, and watched the evolution of the project. There’s a need, however, for a community-driven project that’s developed jointly by HAProxy Technologies.

The new HAProxy Kubernetes Ingress Controller provides a high-performance ingress for your Kubernetes-hosted applications. It supports TLS offloading, Layer 7 routing, rate limiting, whitelisting, and the best-in-class performance that HAProxy is renowned for. Ingresses can be configured through either ConfigMap resources or annotations and there’s support for defining secrets for storing TLS certificates.

GitHub: https://github.com/haproxytech/kubernetes-ingress
Documentation: https://www.haproxy.com/documentation/hapee/1-9r1/traffic-management/kubernetes-ingress-controller/

Prometheus Exporter

HAProxy now has native support for exposing metrics to Prometheus. Prometheus is an open-source systems monitoring and alerting toolkit that was originally built at SoundCloud. Its adoption has been widespread and it inspires an active community.

To begin using the Prometheus exporter you must first compile HAProxy with the component by using the EXTRA_OBJS variable. An example make command would be:

Activate the exporter within your HAProxy configuration by adding an http-request use-service directive, like so:

Read more about the Prometheus integration on the blog post HAProxy Exposes a Prometheus Metrics Endpoint.

Peers & Stick Tables Improvements

HAProxy allows propagation of stick table data to other HAProxy nodes using the Peers Protocol. HAProxy 2.0 introduces several improvements to the Peers Protocol including:

Heartbeat
Stick tables in peers sections
SSL support
Runtime API command: show peers
New stick table counters
New stick table data type, server_name

A node now sends a heartbeat message to its peers after a three-second period of inactivity. If there isn’t any activity within a five-second period, the peer is considered dead, the connection is closed, and reconnection is attempted.

The peers section has been expanded to allow using the bind, default-bind, server, and default-server configuration directives. It also now supports having stick tables directly within itself. This means that you no longer need to use dummy backends, as previously recommended, when dealing with many different stick tables.

In the following example, we define a stick table directly inside a peers section and encrypt traffic between nodes using SSL:

Within a frontend you would then specify the following:

Using the Runtime API, you can now get information about the various peers connections using show peers.

The stick table counters gpc1 and gpc1_rate are additional, general-purpose counters that can be incremented using configuration logic. A new stick table data type, server_name, was added. It functions the same as server_id except that the server’s name is exchanged over the wire in addition of its ID. To learn how to take advantage of stick tables, check out our blog post Introduction to HAProxy Stick Tables.

Power of Two Random Choices Algorithm

In the 1.9 release, a new load-balancing algorithm was added called random. It chooses a random number as the key for the consistent hashing function. Random load balancing can be useful with large farms or when servers are frequently added or removed, as it may avoid the hammering effect that could result from roundrobin or leastconn in this situation. It also respects server weights and dynamic weight changes and server additions take effect immediately.

The hash-balance-factor directive can be used to further improve the fairness of the load balancing, especially in situations where servers show highly variable response times.

When setting balance to random, the argument <draws> indicates that HAProxy should draw that many random servers and then select the one that is least loaded. Drawing two servers even has a name: the Power of Two Random Choices algorithm.

Specify Power of Two load balancing within your backend as follows:

You can read more about this in our blog post Test Driving “Power of Two Random Choices” Load Balancing.

Log Distribution & Sampling

When dealing with a high volume of logs, sampling can be extremely beneficial, giving you a random insight into the traffic. Typically, this sampling would need to be performed by a syslog server such as rsyslog. With HAProxy 2.0, you can now do sampling directly within HAProxy by using the log directive’s sample parameter. Multiple log and sampledirectives can be specified simultaneously.

To get started, configure logging as follows:

The first log line configures all local0 logs to be sent to stderr. The second log line configures logging to 127.0.0.1:10001 at a sampled rate. One out of 10 requests would be logged to this source. Sending 100 requests while incrementing the URL parameter i results in the following log entries:

The third log line configures logging to 127.0.0.2 on port 10002 at a sampled rate. For every 11 requests, it will log requests 2, 3, and 8-11. Sending 100 requests while incrementing the URL parameter i results in the following log entries:

Built-in Automatic Profiling

HAProxy now features the profiling.tasks directive, which can be specified in the global section. It takes the parameters auto, on, or off. It defaults to auto.

When set to auto, the profiling automatically switches on when the process starts to suffer from an average latency of 1000 microseconds or higher, as reported in the avg_loop_usactivity field, and automatically turns off when the latency returns below 990 microseconds. This value is an average over the last 1024 loops. So, it does not vary quickly and tends to smooth out short spikes. It may also spontaneously trigger from time to time on overloaded systems, containers, or virtual machines, or when the system swaps—which must absolutely never happen on a load balancer.

To view the activity you can use the show activity Runtime API command, as shown:

To view the status of profiling, use the show profiling Runtime API command:

Profiling exposes the following fetches, which can be captured in the HAProxy log:

Fetch method	Description
date_us	The microseconds part of the date.
cpu_calls	The number of calls to the task processing the stream or current request since it was allocated. It is reset for each new request on the same connection.
cpu_ns_avg	The average number of nanoseconds spent in each call to the task processing the stream or current request.
cpu_ns_tot	The total number of nanoseconds spent in each call to the task processing the stream or current request.
lat_ns_avg	The average number of nanoseconds spent between the moment the task handling the stream is woken up and the moment it is effectively called.
lat_ns_tot	The total number of nanoseconds between the moment the task handling the stream is woken up and the moment it is effectively called.

To use these in the logs, you would either extend the default HTTP log-format, like so:

Or, extend the default TCP log-format:

Enhanced TCP Fast Open

HAProxy now has end-to-end support for TCP Fast Open (TFO), enabling clients to send a request and receive a response during the TCP three-way handshake. The benefit of this is that you save one round-trip after the first connection.

HAProxy has supported TFO on the frontend since version 1.5. Version 2.0 enhances this by adding TFO for connections to backend servers on systems that support it. This requires Linux kernel 4.11 or newer. Add the tfo parameter to a server line.

Be sure to enable retries with the retry-on directive or the request won’t be retried on failure.

New Request Actions

As a part of this release, several new http-request and tcp-request actions were introduced. Here is a breakdown of these new actions with their descriptions.

http-request replace-uri <match-regex> <replace-fmt>This matches the regular expression in the URI part of the request according to <match-regex> and replaces it with the <replace-fmt> argument.

Action	Description
http-request do-resolve(<var>,<resolvers>,[ipv4,ipv6]) <expr>	Performs DNS resolution of the output of <expr> and stores the result in the variable <var>. It uses the DNS `resolvers` section pointed to by <resolvers>. It is possible to choose a resolution preference by using the optional arguments ipv4 or ipv6.
http-request disable-l7-retry	Disables any attempt to retry the request if it fails for any reason other than a connection failure. This can be useful, for example, to make sure POST requests aren’t retried upon failure.
tcp-request content do-resolve(<var>,<resolvers>,[ipv4,ipv6]) <expr>	Performs DNS resolution of the output of <expr> and stores the result in the variable <var>. It uses the DNS `resolvers` section pointed to by <resolvers>. It is possible to choose a resolution preference by using the optional arguments ipv4 or ipv6.
tcp-request content set-dst <expr>	Used to set the destination IP address to the value of the specified expression.
tcp-request content set-dst-port <expr>	Used to set the destination port to the value of the specified expression.

The http-request do-resolve and tcp-request do-resolve warrant further explanation. They allow you to resolve a DNS hostname and store the result in an HAProxy variable. Consider the following example:

Here, we’re using http-request do-resolve to perform a DNS query on the hostname found in the Host request header. The nameserver(s) referenced in the mydns resolverssection (not shown) will return the IP address associated with that hostname and HAProxy will then store it in the variable txn.dstip. The http-request set-dst line in the be_mainbackend updates the server address with this variable.

This is beneficial in split horizon DNS environments, wherein the DNS server will return different results, such as publicly routable or internal-only addresses, depending on the client’s, which is the load balancer’s, source IP address. So, you could have Dev and Prodload balancers that receive different DNS records when they call do-resolve. This is much more dynamic than the at-runtime DNS resolution available in HAProxy (i.e. using the resolvers parameter on the server line), which is typically set to hold onto a DNS result for a period of time. As such, it’s also suitable for other scenarios involving highly dynamic environments, such as where upstream servers are ephemeral.

New Converters

Converters allow you to transform data within HAProxy and are usually followed after a fetch. The following converters have been added to HAProxy 2.0:

Converter	Description
aes_gcm_dev	Decrypts the raw byte input using the AES128-GCM, AES192-GCM or AES256-GCM algorithm.
protobuf	Extracts the raw field of an input binary sample representation of a Protocol Buffers message.
ungrpc	Extracts the raw field of an input binary sample representation of a gRPC message.

New Fetches

Fetches in HAProxy provide a source of information from either an internal state or from layers 4, 5, 6, and 7. New fetches that you can expect to see in this release include:

Fetch	Description
ssl_fc_client_random	Returns the client random of the front connection when the incoming connection was made over an SSL/TLS transport layer. It is useful to decrypt traffic sent using ephemeral ciphers. This requires OpenSSL >= 1.1.0, or BoringSSL.+
ssl_fc_server_random	Returns the server random of the front connection when the incoming connection was made over an SSL/TLS transport layer. It is useful to decrypt traffic sent using ephemeral ciphers. This requires OpenSSL >= 1.1.0, or BoringSSL.
sl_bc_client_random	Returns the client random of the back connection when the incoming connection was made over an SSL/TLS transport layer. It is useful to decrypt traffic sent using ephemeral ciphers. This requires OpenSSL >= 1.1.0, or BoringSSL.
ssl_bc_server_random	Returns the server random of the back connection when the incoming connection was made over an SSL/TLS transport layer. It is useful to decrypt traffic sent using ephemeral ciphers. This requires OpenSSL >= 1.1.0, or BoringSSL.

Miscellaneous Improvements

The following miscellaneous improvements have been made:

SSL/TLS Ticket Keys
- TLS session tickets help to speed up session resumption for clients that support them. HAProxy 2.0 adds support for AES256-bit ticket keys specified in both a file or through the Runtime API.
Core Dump – Ease of Use
- A new global directive set-dumpable has been added, which aids in enabling core dumps. It’s been known to be a pain to get a core dump when enabling the user and group settings, which disables the dumpable flag on Linux, when using a chroot and/or when HAProxy is started by a service management tool that requires complex operations to just raise the core dump limit. This directive makes it much easier to retrieve a core file.
SOCKS4 Support
- Introduces 2 new server keywords: socks4 and check-via-socks4 which can be used for communicating with servers within a backend over SOCKS4 and adds similar functionality for health checking over SOCKS4.

LTS Support for 1.9 Features

HAProxy 2.0 bring LTS support for the aforementioned features, as well as the following features that were introduced or improved upon during the 1.9 release:

Small Object Cache with an increased caching size up to 2GB, set with the max-object-size directive. The total-max-size setting determines the total size of the cache and can be increased up to 4095MB.
New fetches that report either an internal state or from layer 4, 5, 6, and 7.
New converters that allow you to transform data within HAProxy.
HTTP 103 (Early Hints), which asks the browser to preload resources.
Server Queue Priority Control, which lets you prioritize some queued connections over others.
Connection pooling to backend servers.
The resolvers section supports using resolv.conf by specifying parse-resolv-conf.
The busy-polling directive allows the reduction of request processing latency by 30-100 microseconds on machines using frequency scaling or supporting deep idle states.
Lua
- The Server class gained the ability to change a server’s maxconn value.
- The TXN class gained the ability to adjust a connection’s priority within the server queue.
- There is a new StickTable class that allows access to the content of a stick-table by key and allows dumping of content.
Regression testing of the HAProxy code using varnishtest.

HAProxy 2.1 Preview

HAProxy 2.1 will build on the foundation that has been laid in HAProxy 1.9 and 2.0. Some of the exciting features planned are:

UDP Support
OpenTracing
Dynamic SSL Certificate Updates

Conclusion

HAProxy remains at the forefront of performance and innovation because of the commitment of the open-source community and the staff at HAProxy Technologies. We’re excited to bring you this news of the 2.0 release! In addition to the features included in this version, it paves the way for many exciting updates, which, with our new release cadence, you’ll see more frequently.

It immediately brings support for end-to-end HTTP/2, gRPC, Layer 7 Retries, traffic shadowing, connection pooling on the server side, a Process Manager, the Power of Two Random Choices Algorithm, and a Prometheus Exporter. Of course, one of the most powerful additions is the new Data Plane API, which allows you to dynamically configure HAProxy using RESTful HTTP calls.

Apache Hadoop YARN: Moving beyond MapReduce and Ba

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 .mobi: http://www.t00y.com/file/79497801

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2.pdf: http://www.t00y.com/file/80342441

Hadoop.Cluster.Deployment.pdf: http://www.t00y.com/file/80342444

Hadoop: The Definitive Guide.pdf: http://www.t00y.com/file/80342478

Pro Apache Hadoop.pdf: http://www.t00y.com/file/80342492

Getting-Started-With-Storm.pdf: http://www.t00y.com/file/76668750

Learning Storm.pdf: http://www.t00y.com/file/76668827

Real-time Stream Processing and Visualization Using Kafka, Storm.pdf: http://www.t00y.com/file/76668918

Realtime Processing with Storm Presentation.pdf: http://www.t00y.com/file/76669000

Storm Blueprints- Patterns for Distributed Real-time Computation.pdf: http://www.t00y.com/file/76670686

CENTOS7 高性能Linux集群通过yum进行 haproxy配置！安装！使用！HAProxy配置文件详解

照片参考：https://blog.csdn.net/weixin_...

Haproxy配置

永久修改名字：便于区分虚拟机！
Haproxy:hostnamectl set-hostname haproxy
Web1: hostnamectl set-hostname WEB1
Web2:hostnamectl set-hostname WEB2
Test:hostnamectl set-hostname test

1、haproxy安装并将过程截图
yum list |grep haproxy：查看haproxyl列表，查询可能会花费较长时间！

yum -y install haproxy：安装：

[点击并拖拽以移动]

2、填写IP分配表：
服务器角色 IP地址

Haproxy

192.168.1*8.128

Web1

192.168.1*8.137

Web2

192.168.1*8.138
test 192.168.1*8.135

3、简答：说明haproxy配置文件中global、defaults、frontend、backend参数的作用？
Global：

用于设置全局配置参数，属于进程级的配置，通常用操作系统配置相关

defaults：

默认参数的配置部分。在些部分设置的参数，默认会自动引用到下面的frontend, backend和listen部分

frontend：

用于设置接收用户请求的前端虚拟节点。frontend可以根据ACL规则直接指定要使用的后端backend

backend：

用于设置集群后端服务集群的配置，也就是用来添加一组真实服务器，以处理前端用户的请求
4、Haproxy配置并截图
编辑/etc/haproxy/haproxy.cfg配置文件，配置内容如下：
位置：/etc/haproxy/haproxy.cfg
vim /etc/haproxy/haproxy.cfg

global

log 127.0.0.1 local2

chroot /var/lib/haproxy

pidfile /var/run/haproxy.pid

maxconn 4000

user haproxy

group haproxy

daemon

stats socket /var/lib/haproxy/stats

defaults

mode http

log global

option httplog

option dontlognull

option http-server-close

option forwardfor except 127.0.0.0/8

option redispatch

retries 3

timeout http-request 10s

timeout queue 1m

timeout connect 10s

timeout client 1m

timeout server 1m

timeout http-keep-alive 10s

timeout check 10s

maxconn 3000

frontend main *:80

acl url_static path_beg -i /static /images /javascript /stylesheets

acl url_static path_end -i .jpg .gif .png .css .js

use_backend static if url_static

default_backend web

backend static

balance roundrobin

server static 127.0.0.1:4331 check

backend web

balance roundrobin

server web1 192.168.158.137:80 check

server web2 192.168.158.138:80 check

保存退出

3、启动haproxy

启动之前检查语法是否有问题，命令如下
haproxy -f /etc/haproxy/haproxy.cfg -c

-f filename 如果 filename为常规文件，则为真

linux命令中的-c一般是指：当执行某个可执行文件时，后面加 -c 然后再加配置(或参数)文件，

来达到指定以某种配置参数来启动脚本的目的。

例如：nginx的启动命令：/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf

Configuration file is valid！就OK啦！

启动haproxy:
关闭防火墙：systemctl start haproxy (service haproxy start 命令不行！)

第一次：

第二次：

感觉不放心还可以用另外一台虚拟机来试一下！
代码：for i in {1..40};do curl 192.168.158.128;done

通过这种方式的安装所有文件路径：

便于查找，查看。给出路径！

HAProxy配置文件详解

根据功能、用途不同，其配置文件主要由5部分组成，分别为global部分，defautls部分，frontend部分、backend部分、listen部分

1)global部分：用于设置全局配置参数，属于进程级的配置，通常用操作系统配置相关

2) defaults部分：默认参数的配置部分。在些部分设置的参数，默认会自动引用到下面的frontend, backend和listen部分

3) frontend部分：用于设置接收用户请求的前端虚拟节点。frontend可以根据ACL规则直接指定要使用的后端backend

4) backend部分：用于设置集群后端服务集群的配置，也就是用来添加一组真实服务器，以处理前端用户的请求

5) listen部分：此部分是frontend和backend部分的结合体

配置项说明：

1) global部分

global

log 127.0.0.1 local1 info #全局的日志配置，local0是日志设备，info表示日志级别。其中日志级别有err, warning, info, debug 4种。这个配置表示使用127.0.0.1上的rsyslog服务中的local0日志设备，记录日志等级为info

maxconn 4096 #设置每个HAProxy进程可接受的最大并发连接数

user nobody #设置启动HAProxy进程的用户和组

group nobody

daemon #设置HAProxy进程进入后台运行，这是推荐的运行模式

nbproc 1 #设置HAProxy启动时可创建的进程数，此参数要求将HAProxy运行模式设置为daemon，默认只启动一个进程；建议该值设置时小于CPU核数

pidfile /usr/local/haproxy/logs/haproxy.pid #指定HAProxy进程ID的存放位置

2) defaults部分

defaults

mode http #设置HAProxy实例默认的运行模式，有tcp, http, health三个可选值。tcp模式：在此模式下，客户端和服务器端间将建立一个全双工的连接，不会对七层报文做任何检查，为默认的模式；经常用于SSL, SSH, SMTP等应用；http模式：在此模式下，客户端请求在转发至后端服务器前将会被深度分析，所有不与RFC格式兼容的请求都会被拒绝；

retires 3 #设置连接后端服务器的失败重试次数，如果连接失败的次数超过该数值，HAProxy会将对应的后端服务器标记为不可用

timeout connect 10s #设置成功连接到一台服务器的最长等待时间，默认单位是毫秒，但也可以使用其他时间单位作后缀

timeout client 20s #设置连接客户端发送数据时最长等待时间，默认单位是毫秒，但也可以使用其他时间单位作后缀

timeout server 30s #设置服务器端回应客户端数据发送的最长等待时间，默认单位是毫秒，但也可以使用其他时间单位作后缀

timeout check 5s #设置对后端服务器的检测超时时间，默认单位是毫秒，但也可以使用其他时间单位作后缀

3) frontend部分

frontend www #通过frontend关键字定义了一个名为"www"的前端虚拟节点

bind *:80 #此选项用于定义一个或者几个监听的套接字，只能在frontend和listen中定义，格式如下：bind [<address>:[port_range]] [interface]

mode http

option httplog #默认情况下，HAProxy日志是不记录HTTP请求的，此选项的作用是启用日志记录HTTP请求

option forwardfor #此选项的作用是保证后端服务器可记录客户端真实的IP

option httpclose #此选项表示客户端和服务端完成一次连接请求后，HAProxy将主动关闭此TCP连接。这是对性能非常有帮助的一个参数

log global #表示使用global段中定义的日志格式

default_backend htmpool #此选项用于指定后端默认的服务器池

4) backend部分

backend htmpool

mode http

option redispatch #此参数用于cookie保持的环境中。在默认情况下，HAProxy会将其请求的后端服务器的serverID插入cookie中，以保证会话的session持久性。而如果后端服务器出现故障，客户端的cookie是不会刷新的，这就会造成无法访问。此时，如果设置了此参数，就会将客户的请求强制定向到另外一台健康的后端服务器上，以保证服务正常

option abortonclose #此参数可以在服务器负载很高的情况下，自动结束当前队列中处理时间比较长的连接

balance roundrobin #负载均衡算法

cookie SERVERID #表示允许向cookie插入SERVERID，每台服务器的SERVERID可在下面的server关键字中使用cookie关键字定义

option httpchk GET /index.php #此选项表示启用HTTP的服务状态检测功能

server web1 10.1.1.1:80 cookie server1 weight 6 check inter 2000 rise 2 fall 3

server web2 10.1.1.2:80 cookie server2 weight 6 check inter 2000 rise 2 fall 3#server用于定义多台后端真实服务器，不能用于frontend和listen段

5) listen部分

listen admin_status #listen部分用于配置HAProxy监控页面相关的参数

bind 0.0.0.0:9188

mode http

log 127.0.0.1 local0 err

stats refresh 30s #设置HAProxy监控统计页面自动刷新的时间

stats uri /haproxy-status #设置HAProxy监控页面访问的URI路径

stats realm Welcome login #设置登录监控页面时，密码框上的提示信息

stats auth admin:admin #设置登录监控页面的用户名，密码。用户密码用冒号隔开，可以设置多个，每行一个

stats hide-version #设置在监控页面上隐藏HAProxy的版本号

stats admin if TRUE #设置此选项，可在监控页面上启用、禁用后端服务器，仅在1.4.9版本以后生效

6、HAProxy支持的负载均衡算法：

roundrobin：基于权重进行轮叫调度的算法

static-rr：基于权重进行轮叫调度的算法，不过此算法为静态算法，在运行时调整其服务器权重不会生效

source：基于请求源IP的算法。此算法先对请求的源IP进行HASH运算，然后将结果与后端服务器的权重总数相除后转发至某台匹配的后端服务器。这种方式可以使同一个客户端IP的请求始终转发到某特定的后端服务器

leastconn：此算法会将新的连接请求转发到具有最少连接数目的后端服务器。在会话时间较长的场景中推荐使用此算法，例如数据库负载均衡

uri：此算法会对部分或整个URI进行HASH运算，再经过与服务器的总权重相除，最后转发到某台匹配的后端服务器上

uri_param：此算法会根据URL路径中的参数进行转发，这样可保证在后端真实服务器数据不变时，同一个用户的请求始终分发到同一台机器上

hdr：此算法根据HTTP头进行转发，如果指定的HTTP头名称不存在，则使用roundrobin算法进行策略转发

com.sun.proxy.$Proxy4 cannot be cast异常

return Proxy.newProxyInstance(connection.getClass().getClassLoader(), Connection.class.getInterfaces(), new InvocationHandler() {

        public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
            Object result = null;
            if(method.getName().equals("close")){
                //获取连接池
                Set&lt;Connection&gt; set = DBCPMyPoolUtil.getSet();

                if(set.size()&lt; maxIdle){ //检查连接数是否达到最大保存连接数值
                    //如果没有达到可保存数，将连接放入连接池
                    set.add(connection);
                    System.out.println("回收连接: "+connection);
                } else {
                    //如何连接数达到最大保存值，直接用close方法关闭连接，该连接不再被重复使用，更不会被保存到连接池中
                    if(connection!=null){
                        //此时method方法为close直接关闭method
                        result = method.invoke(connection,args);
                    }
                    System.out.println("关闭conn: "+connection);
                }
                int active = DBCPMyPoolUtil.getActive();
                active--;
                DBCPMyPoolUtil.setActive(active);

            } else {
               result = method.invoke(connection,args);
            }
            return result;
        }
    });

报错：java.lang.ClassCastException: com.sun.proxy.$Proxy4 cannot be cast to java.sql.Connection

在使用动态代理增强Connection连接对象的close方法时，我碰到了如题所示的异常。通过搜索我发现这个异常出现的原因在于我使用的mysql数据库驱动的问题，由于数据库驱动不同，Connection.class.getInterfaces()返回的结果也不同，它返回的是一个Class[]数组，然而此数组的第一个元素必须是Connection才能把创建的代理类转为Connection对象，否则就会报错。

解决方案：
　　所以这里我们可以采取一个替代方式替换Connection.class.getInterfaces()，即new Class[] { Connection.class }，这样无论数据库驱动是什么版本的驱动，都能保证这个类型转换不出错。

return Proxy.newProxyInstance(connection.getClass().getClassLoader(), new Class[] { Connection.class }, new InvocationHandler() {
...
}

Compare with Haproxy and Nginx_tcp_proxy_module

nginx-0.7.65的配置：

worker_processes 4;

tcp {

upstream test{

server 172.19.0.129;

server 172.19.0.130;

server 172.19.0.131;

server 172.19.0.132;

server 172.19.0.133;

server 172.19.0.134;

server 172.19.0.135;

server 172.19.0.136;

server 172.19.0.137;

server 172.19.0.138;

server 172.19.0.139;

check interval=3000 rise=2 fall=5 timeout=1000 type=http;

check_http_send "GET / HTTP/1.0\r\n\r\n";

check_http_expect_alive http_2xx http_3xx;

}

server {

listen 1982;

proxy_pass test;

}

haproxy-1.4.1的配置：

global

log 127.0.0.1 daemon

user yaoweibin

nbproc 4

defaults

backlog 2048

balance roundrobin

log global

mode tcp

retries 3

contimeout 6s

clitimeout 600s

srvtimeout 600s

listen test 0.0.0.0:8080

server test-1 172.19.0.129:80 check inter 3s rise 2 fall 5 weight 10

server test-2 172.19.0.130:80 check inter 3s rise 2 fall 5 weight 10

server test-3 172.19.0.131:80 check inter 3s rise 2 fall 5 weight 10

server test-4 172.19.0.132:80 check inter 3s rise 2 fall 5 weight 10

server test-5 172.19.0.133:80 check inter 3s rise 2 fall 5 weight 10

server test-6 172.19.0.134:80 check inter 3s rise 2 fall 5 weight 10

server test-7 172.19.0.135:80 check inter 3s rise 2 fall 5 weight 10

server test-8 172.19.0.136:80 check inter 3s rise 2 fall 5 weight 10

server test-9 172.19.0.137:80 check inter 3s rise 2 fall 5 weight 10

server test-10 172.19.0.138:80 check inter 3s rise 2 fall 5 weight 10

server test-11 172.19.0.139:80 check inter 3s rise 2 fall 5 weight 10

我的测试机器是一台四核的linux机器，请求获取的是203个字节的网页，后端有11台web服务器。由于在同一台服务器上用ab，当测试并发数大的时候，ab本身的CPU资源消耗有一定的影响。下面是测试结果：

ab是keepalive的情况

Compare with Haproxy and Nginx_tcp_proxy_module - 120斤的大青蛙 - 老和山小和尚

ab是nokeepalive的情况

结论：

我自己测试的结果是Nginx_tcp_proxy_module比Haproxy性能是要好一些的，但是由于测试的局限和偶然性，不排除在某些情况下有差异。比如在大文件的传输时，两者差异比较小，都可以轻易把千兆的网卡跑满，有机会我在把这部分测试补上。

ps: 在测试的时候把后端apache都跑挂了，不敢测了。

关于HAProxy 2.0 and Beyond的介绍现已完结，谢谢您的耐心阅读，如果想了解更多关于Apache Hadoop YARN: Moving beyond MapReduce and Ba、CENTOS7 高性能Linux集群通过yum进行 haproxy配置！安装！使用！HAProxy配置文件详解、com.sun.proxy.$Proxy4 cannot be cast异常、Compare with Haproxy and Nginx_tcp_proxy_module的相关知识，请在本站寻找。

本文标签：