Hystrix

分布式系统遇到的问题

一：在分布式系统中，存在服务A 调用服务B ，而服务B又去调用服务C，服务D,这样的调用过程就是服务扇出。
而在某条扇出的服务调用链路中有一个服务，由于响应时间过程或者抛出异常，导致服务调用者被占用越来越多资源，从而导致整个系统奔溃，整个过程就叫服务雪崩或者级联故障

在这里插入图片描述 ## 解决问题

应用容错三步

1、超时机制

为我们的RestTemplate设置连接以及读取超时时间

@Bean
public RestTemplate restTemplate() {
    //设置restTemplate的超时时间
    SimpleClientHttpRequestFactory requestFactory = new SimpleClientHttpRequestFactory();
    requestFactory.setReadTimeout(2000);
    requestFactory.setConnectTimeout(2000);

    RestTemplate restTemplate = new RestTemplate(requestFactory);
    return restTemplate;
}

调用异常捕获

@RequestMapping("/ribbon")
public String ribbon() {
    //通过微服务实例名称进行调用
    try{
        return restTemplate.getForObject("http://EUREKA-CLIENT2/eurekaClient2Test", String.class);
    } catch (Exception e){
        throw new BaseException(0,"调用超时");
    }
}

全局异常处理

@ControllerAdvice
public class BaseExcpetionHandler {
    @ExceptionHandler(value = BaseException.class)
    @ResponseBody
    public Object dealException() {
        Map<String, Object> map = new HashMap<>();
        map.put("userName", "容错用户");
        map.put("result", "null");
        return map;
    }
}

2、舱壁隔离

在这里插入图片描述

有兴趣的可以先了解一下船舱构造——一般来说，现代的轮船都会分很多舱室，舱室之间用钢板焊死，彼此隔离。这样即使有某个/某些船舱进水，也不会影响其他舱室，浮力够，船不会沉。
软件工程里的仓壁模式可以这样理解：M类使用线程池1，N类使用线程池2，彼此的线程池不同，并且为每个类分配的线程池大小，例如coreSize=10。举个例子：M类调用B服务，N类调用C服务，如果M类和N类使用相同的线程池，那么如果B服务挂了，M类调用B服务的接口并发又很高，你又没有任何保护措施，你的服务就很可能被M类拖死。而如果M类有自己的线程池，N类也有自己的线程池，如果B服务挂了，M类顶多是将自己的线程池占满，不会影响N类的线程池——于是N类依然能正常工作，
思路：不把鸡蛋放在一个篮子里。你有你的线程池，我有我的线程池，你的线程池满了和我没关系，你挂了也和我没关系。

3、熔断器

现实世界的断路器大家肯定都很了解，每个人家里都会有断路器。断路器实时监控电路的情况，如果发现电路电流异常，就会跳闸，从而防止电路被烧毁。
软件世界的断路器可以这样理解：实时监测应用，如果发现在一定时间内失败次数/失败率达到一定阈值，就“跳闸”，断路器打开——此时，请求直接返回，而不去调用原本调用的逻辑。
跳闸一段时间后（例如15秒），断路器会进入半开状态，这是一个瞬间态，此时允许一次请求调用该调的逻辑，如果成功，则断路器关闭，应用正常调用；如果调用依然不成功，断路器继续回到打开状态，过段时间再进入半开状态尝试——通过”跳闸“，应用可以保护自己，而且避免浪费资源；而通过半开的设计，可实现应用的“自我修复“。

hystrix是什么？

Hystrix（豪猪）是由Netflix开源的一个延迟和容错库，用于隔离访问远程系统、服务或者第三方库，防止级联失败，从而提升系统的可用性与容错性

hystrix能干什么？

服务熔断，降级

1、包裹请求

创建工程
导入依赖

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
    <version>2.2.7.RELEASE</version>
</dependency>

主配置类上加入注解

@EnableCircuitBreaker

自定义命令代码

public class MyCommand extends HystrixCommand<String> {

    private RestTemplate restTemplate;

    @Override
    protected String run() throws Exception {
        String forObject = restTemplate.getForObject("http://EUREKA-CLIENT2/eurekaClient2Test", String.class);
        return forObject;
    }

    @Override
    protected String getFallback() {
        System.out.println("触发降级方法========================>");
        return "触发降级方法";
    }

    //构造方法
    public MyCommand(String commandGroupKey, RestTemplate restTemplate, Integer userId) {
        super(HystrixCommandGroupKey.Factory.asKey(commandGroupKey));
        this.restTemplate = restTemplate;
    }
}

调用方式

@Autowired
private RestTemplate restTemplate;

@RequestMapping("/hystrixHello")
public String hystrixHello() {

    //构建调用命令
    MyCommand  myCommand = new MyCommand("orderGroupKey",restTemplate);
    String result = myCommand.execute();
    return result;
}

2、跳闸机制

Hystrix的默认配置跳闸父阈值

1:宕机跳闸

启动服务注册中心以及服务消费者(不启动服务提供者模拟宕机)

2:超时跳闸

在服务提供方的被调用方法中设置线程睡眠时间
//超时降级
Thread.sleep(7000);

3:异常跳闸

查询一个不存在的数据模拟抛出异常

测试熔断打开以及半开

已异常跳闸为例，在时间窗口类，只要连续失败requestVolumeThreshold的值,再来看hystrix的状态已经调用的现象
在时间窗口内，连续点击失败次数,当此时达到设置的requestVolumeThreshold的阈值，那么就直接进入降级方法,
此事再来看hystrix的监控信息
等到熔断器半开后，测试一个正确的查询，那么熔断器就会关闭，恢复正常调用

跳闸机制三转换图

在这里插入图片描述

3、资源隔离

线程隔离
信号量隔离

4、监控

5、自我修复

feign整合hystrix:

feign 默认是支持hystrix的，但是在Spring - cloud Dalston 版本之后就默认关闭了，因为不一定业务需求要用的到，
所以现在要使用首先得打开他，在yml文件加上如下配置:

feign:
  hystrix:
    enabled: true

加上配置之后降级方法怎么写呢？

@FeignClient(value = "SERVER-POWER",fallback = PowerServiceFallBack.class)
public interface PowerServiceClient {

    @RequestMapping("/power.do")
    public Object power(@RequestParam("name") String name);

}

在feign客户端的注解上有个属性叫fallback 然后指向一个类PowerServiceFallBack 类：

@Component
public class PowerServiceFallBack implements PowerServiceClient {
    @Override
    public Object power(String name) {
        return R.error("测试降级");
    }
}

这样子，方法降级就写好了
当然可能你有这种需求，需要拿到具体的错误信息，那么可以这样写:

@Component
public class PowerServiceClientFallBackFactory implements FallbackFactory<PowerServiceClient> {
    @Override
    public PowerServiceClient create(Throwable throwable) {
        return new PowerServiceClient() {
            @Override
            public Object power(String name) {
                String message = throwable.getMessage();
                return R.error("feign降级");
            }
        };
    }
}

客户端指定一个fallbackFactory就好了

@FeignClient(value = "SERVER-POWER",fallbackFactory = PowerServiceClientFallBackFactory.class)
public interface PowerServiceClient {

    @RequestMapping("/power.do")
    public Object power(@RequestParam("name") String name);

}

这个message 就是拿到的错误信息
至此，就完成了feign与hystrix的整合

hystrix相关配置:

Execution相关的属性的配置

hystrix.command.default.execution.isolation.strategy 隔离策略，默认是Thread, 可选Thread｜ Semaphor

hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds 命令执行超时时 间，默认1000ms

hystrix.command.default.execution.timeout.enabled 执行是否启用超时，默认启用true

hystrix.command.default.execution.isolation.thread.interruptOnTimeout 发生超时是是否中断， 默认true

hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests 最大并发请求 数，默认10，该参数当使用ExecutionIsolationStrategy.SEMAPHORE策略时才有效。如果达到最大并发请求 数，请求会被拒绝。理论上选择semaphore size的原则和选择thread size一致，但选用semaphore时每次执行 的单元要比较小且执行速度快（ms级别），否则的话应该用thread。 semaphore应该占整个容器（tomcat）的线程池的一小部分。 Fallback相关的属性 这些参数可以应用于Hystrix的THREAD和SEMAPHORE策略

hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests 如果并发数达到 该设置值，请求会被拒绝和抛出异常并且fallback不会被调用。默认10

hystrix.command.default.fallback.enabled 当执行失败或者请求被拒绝，是否会尝试调用

hystrixCommand.getFallback() 。默认true

Circuit Breaker相关的属性

hystrix.command.default.circuitBreaker.enabled 用来跟踪circuit的健康性，如果未达标则让request短路。默认true

hystrix.command.default.circuitBreaker.requestVolumeThreshold 一个rolling window内最小的请 求数。如果设为20，那么当一个rolling window的时间内（比如说1个rolling window是10秒）收到19个请求， 即使19个请求都失败，也不会触发circuit break。默认20

hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds 触发短路的时间值，当该值设 为5000时，则当触发circuit break后的5000毫秒内都会拒绝request，也就是5000毫秒后才会关闭circuit。 默认5000

hystrix.command.default.circuitBreaker.errorThresholdPercentage错误比率阀值，如果错误率>=该 值，circuit会被打开，并短路所有请求触发fallback。默认50

hystrix.command.default.circuitBreaker.forceOpen 强制打开熔断器，如果打开这个开关，那么拒绝所 有request，默认false

hystrix.command.default.circuitBreaker.forceClosed 强制关闭熔断器 如果这个开关打开，circuit将 一直关闭且忽略circuitBreaker.errorThresholdPercentage

Metrics相关参数

hystrix.command.default.metrics.rollingStats.timeInMilliseconds 设置统计的时间窗口值的，毫秒 值，circuit break 的打开会根据1个rolling window的统计来计算。若rolling window被设为10000毫秒， 则rolling window会被分成n个buckets，每个bucket包含success，failure，timeout，rejection的次数 的统计信息。默认10000

hystrix.command.default.metrics.rollingStats.numBuckets 设置一个rolling window被划分的数 量，若numBuckets＝10，rolling window＝10000，那么一个bucket的时间即1秒。必须符合rolling window  % numberBuckets == 0。默认10

hystrix.command.default.metrics.rollingPercentile.enabled 执行时是否enable指标的计算和跟踪， 默认true

hystrix.command.default.metrics.rollingPercentile.timeInMilliseconds 设置rolling  percentile window的时间，默认60000

hystrix.command.default.metrics.rollingPercentile.numBuckets 设置rolling percentile  window的numberBuckets。逻辑同上。默认6

hystrix.command.default.metrics.rollingPercentile.bucketSize 如果bucket size＝100，window ＝10s，若这10s里有500次执行，只有最后100次执行会被统计到bucket里去。增加该值会增加内存开销以及排序 的开销。默认100

hystrix.command.default.metrics.healthSnapshot.intervalInMilliseconds 记录health 快照（用 来统计成功和错误绿）的间隔，默认500ms

Request Context 相关参数

hystrix.command.default.requestCache.enabled 默认true，需要重载getCacheKey()，返回null时不 缓存

 hystrix.command.default.requestLog.enabled 记录日志到HystrixRequestLog，默认true

Collapser Properties 相关参数

hystrix.collapser.default.maxRequestsInBatch 单次批处理的最大请求数，达到该数量触发批处理，默认 Integer.MAX_VALU
 
 hystrix.collapser.default.timerDelayInMilliseconds 触发批处理的延迟，也可以为创建批处理的时间 ＋该值，默认10
 
 hystrix.collapser.default.requestCache.enabled 是否对HystrixCollapser.execute() and  HystrixCollapser.queue()的cache，默认true

ThreadPool 相关参数

线程数默认值10适用于大部分情况（有时可以设置得更小），如果需要设置得更大，那有个基本得公式可以 follow： requests per second at peak when healthy × 99th percentile latency in seconds + some breathing room 每秒最大支撑的请求数 (99%平均响应时间 + 缓存值) 比如：每秒能处理1000个请求，99%的请求响应时间是60ms，那么公式是： 1000 （0.060+0.012）

基本得原则时保持线程池尽可能小，他主要是为了释放压力，防止资源被阻塞。当一切都是正常的时候，线程池一般仅会有1到2个线程激活来提供服务

hystrix.threadpool.default.coreSize 并发执行的最大线程数，默认10

hystrix.threadpool.default.maxQueueSize BlockingQueue的最大队列数，当设为－1，会使用

SynchronousQueue，值为正时使用LinkedBlcokingQueue。该设置只会在初始化时有效，之后不能修改threadpool的queue size，除非reinitialising thread executor。默认－1。

hystrix.threadpool.default.queueSizeRejectionThreshold 即使maxQueueSize没有达到，达到 queueSizeRejectionThreshold该值后，请求也会被拒绝。因为maxQueueSize不能被动态修改，这个参数将允 许我们动态设置该值。if maxQueueSize == 1，该字段将不起作用 hystrix.threadpool.default.keepAliveTimeMinutes 如果corePoolSize和maxPoolSize设成一样（默认 实现）该设置无效。如果通过plugin（https://github.com/Netflix/Hystrix/wiki/Plugins）使用自定义 实现，该设置才有用，默认1.
hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds 线程池统计指标的时间，默 认10000

hystrix.threadpool.default.metrics.rollingStats.numBuckets 将rolling window划分为n个 buckets，默认10

第六节：SpringCloud Hystrix