基于自适应零行列式策略的区块链矿池合作演化方法

2019-07-31 12:14范丽郑红黄建华李忠诚江亚慧

计算机应用 2019年3期

范丽郑红黄建华李忠诚江亚慧

摘要：矿工加入矿池是目前比特币挖矿最常见的方式。然而，比特币系统中存在矿池互相渗透攻击的现象，这将导致被攻击矿池的矿工收益减少，发起攻击的矿池算力降低，从而造成比特币系统的整体算力减小。针对矿池之间互相攻击，不合作挖矿的问题，提出自适应零行列式策略（AZD），采取“比较预期合作收益与背叛收益，选择促进高收益的策略”的思想促进矿池合作。首先，通过结合时序差分增强算法与零行列式策略的方法预测下一轮合作收益与背叛收益;其次，通过决策过程（DMP）选择策略进一步改变下一轮的合作概率和背叛概率;最后，通过迭代执行自适应零行列式策略，达到网络中矿池均互相合作、积极挖矿的目的。实验模拟表明，AZD策略与自适应策略相比，合作概率收敛为1的速度提高了36.54%;与零行列式策略相比，稳定度提高了50%。这个结果表明AZD策略能够有效促进矿工合作，提高合作收敛速率，保证矿池的稳定收益。

关键词：比特币;时序差分增强算法;自适应策略方法;零行列式策略;决策过程

中图分类号： TP399

文献标志碼：A

文章编号：1001-9081（2019）03-0918-06

Abstract： At present， the most common way for bitcoin mining is miners joining in a pool. However， there is a phenomenon that the mining pools penetrate each other， which will result in a decrease in the miners' income of the attacked pools， and a reduction in computing power of the attacking pools. Therefore， the overall computing power of the bitcoin system is reduced. Aiming at the problem of mutual attack and non-cooperative mining between mining pools， an Adaptive Zero-Determinant strategy （AZD） was proposed to promote the cooperation of miners. The strategy adopted the idea of comparing expected payoff with cooperation and defection in the next round then choosing a strategy with high payoff. Firstly， miners' payoff in the next round under two situations could be predicted by the combination of Temporal Difference Learning Method （TD（λ）） and Zero-Determinant strategy （ZD）. Secondly， by comparing the cooperation payoff with defection payoff in the next round， a more favorable strategy was chosen for miners by Decision Making Process （DMP）， so the cooperation probability and defection probability in the next round were changed correspondingly. Finally， through the iterative implementation of AZD strategy， the ming pools in the network would cooperate with each other and mine actively. Simulation results show that compared with adaptive strategy， AZD strategy increases the speed of converging cooperation probability to 1 by 36.54%， compared with ZD strategy， it improves the stability by 50%. This result indicates that AZD strategy can effectively promote the cooperation of miners， improve the convergence rate of cooperation and ensure the stable income of mining pools.

Key words： bitcoin; temporal difference learning method; adaptive strategy; zero-determinant strategy; Decision Making Process （DMP）

0 引言

区块链[1-3]是一种分布式的链式数据结构，网络中每个节点上保存一个完整的副本，并通过共识算法将全网信息统一。比特币是[4-5]区块链一个成熟的应用，工作量证明算法（Proof-of-Work， PoW）[6]是比特币采用的共识机制，它保证了区块链的数据一致性和交易的难以篡改。在比特币中，节点收集交易并消耗算力争夺记账权来获取收益，这一过程叫作挖矿[7-9]。随着比特币的流行与价值增长，越来越多的节点加入网络成为矿工，网络算力的增加导致挖矿难度迅速上升[10]，单个节点成功挖矿的概率极低。为了获得稳定的收益，矿工自发聚集，形成矿池[11]，矿池成员互相合作，分享自己的工作量证明，并按照贡献分得奖励。

矿池中的矿工通过挖矿获取收益，可能存在不合作挖矿只分享收益的矿工节点。由于大部分矿池是开放的，允许任何人加入，其他矿池的忠诚节点会渗透到本矿池，从而对本矿池发起区块扣留攻击（block withholding attack）[12-13]，即从不进行工作量证明，或只向矿池管理员发送部分工作量证明，当产生完整的工作量证明就将其抛弃。区块扣留攻击不会为矿池提供有效收益，却共享矿池收益，导致被攻击矿池的收益减少，同时也使得发起攻击的矿池算力减少。

在比特币矿池的场景下，矿池也面临着相似的困境。若没有矿池选择攻击，均公平挖矿，所有矿池将获得赢得收益;若一个矿池选择攻击，它会因此获得高收益，并导致被攻击矿池收益降低;若两个矿池都选择攻击，在双方的渗透率达到纳什均衡[14]时，每个矿池的收益都将低于双方均不攻击的收益。为了获得高收益，在矿池的理性思考下，矿池均选择渗透攻击，即陷入PoW共识算法中的矿工困境[15]，类似于博弈论中的囚徒困境（Prisoner's Dilemma）[16-17]。同时，（攻击，攻击）是矿工困境唯一的纳什均衡[18-19]。在本文中，矿池积极挖矿，矿池之间互不攻击称为合作策略，矿池选择攻击称为背叛策略，即矿工困境的纳什均衡是（背叛，背叛）。矿池选择背叛策略导致矿池算力减小，交易处理速率降低，区块吞吐量减小，系统收益减少。目前，中国地区的矿池算力占比特币网络算力的81%，加入矿池是矿工挖矿最重要的方式，矿工困境是亟待解决的问题。

为了促进矿池互相合作，提高区块吞吐量，增加矿池整体收益与矿工收益，本文采用自适应零行列式（Adaptive Zero-Determinant， AZD）策略（Adaptive Zero-Determinant strategy， AZD）（简称AZD策略，采用AZD策略的矿池称为AZD矿池）对PoW共识算法进行改进。通过采用“预测下一轮合作与背叛收益，采取使得收益高的策略”的思想，基于决策过程（Decision Making Process， DMP）作出决策，同时相应改变下一轮的合作概率，迭代执行自适应零行列式策略，直到达到网络中矿池均合作的状态，最终实现合作共赢。

1 相关工作

1.1 比特币挖矿

随着比特币价值的增加和矿工对稳定收入的追求，矿工通常自发聚集，形成矿池。如图1所示，Pool 1和Pool 2是比特币系统中的两个矿池，矿池中的矿工互相合作，分享工作量证明，并按照贡献分得奖励。比特币系统中存在矿池合作挖矿与矿工单独挖矿两种挖矿策略。

一般说来，矿池的总算力与矿工数量成正比，因此，矿池挖矿的预期收益远大于单独挖矿的收益，这为矿工带来稳定的收入。矿池的挖矿机制为：矿池包含一个集中控制的管理员。管理员生成任务并发给矿工，矿工执行工作量证明算法，并向管理员发送自己的全部工作量证明，由管理员生成一个新的区块，并且将矿池的收益按贡献分配给成员。然而，矿工积极挖矿，以恒定速度产生一个新区块的情况是理想化的。

1.2 比特币中的攻击方式

Eyal[15]指出比特币系统中存在矿工渗透到其他矿池的恶意行为，例如：矿工A是矿池X的矿工，却加入到矿池Y，矿工A从不进行工作量证明，或只向矿池管理员发送部分工作量证明，当产生完整的工作量证明就将其抛弃，即矿池X向矿池Y发起区块扣留攻击。因此两个矿池挖矿存在3种情况：

1）两个矿池互不攻击，均合作挖矿，此时矿池的收益均是应得的。

2）一个矿池发起区块扣留攻击，另一矿池选择合作。假设矿池X渗透到矿池Y的节点算力为MX，Y，网络总算力为M，矿池X的总算力为MX，矿池Y的总算力为MY，则矿池X和Y的总算力占网络总算力的百分比分别是：

将式（3）代入式（4），能够由系数得出rX>rY，因此，当对手选择合作，理性矿池选择背叛能够获取更高的收益。

3）两个矿池均互相攻击。当对手选择背叛，理性矿池选择背叛才能尽可能减少损失，因此导致两个矿池均选择攻击策略，这种情况导致双方的收益均低于互不攻击的收益，同时降低网络算力，降低区块吞吐量。

这种理性思考使得区块链矿池陷入比特币矿工困境中，双方都选择攻击是矿工困境的纳什均衡，这将导致交易处理速率大大降低，区块吞吐量减小，系統收益减少。

1.3 囚徒困境中的典型策略

研究发现，节点每轮更新自己的合作伙伴将导致合作概率显著增加[20]，然而，参与者仅能够随机选择建立或断开联系的节点，不允许有选择地选择合作伙伴，同时参与者无法拒绝其他节点创建的新连接。文献[21]的工作改进克服了节点不能选择合作伙伴的不足，允许节点有选择地决定合作伙伴。建立新的联系和切断现有联系的方式导致参与者互相协调，大大增加合作可能性。

文献[22]总结了囚徒困境中参与者采取的传统策略，包括一直合作（ALways Cooperate， ALLC）、一直背叛（ALways Defect， ALLD）、针锋相对（Tit-For Tat， TFT）和赢留输改（Win-Stay Lose-Shift， WSLS），通过分析传统策略的特点，利用规则图形象描述了在不同空间结构中传统策略的合作进化。文献[23]介绍了一种用在囚徒困境中的自适应策略，在无标度网络中，博弈双方通过增强学习方法不断改进自己的策略，最终实现合作共赢。然而，采用自适应方法达到合作概率为1所需轮数较长，不能满足比特币中交易量大、交易处理速度快的现状。Press和Dyson提出了零行列式策略[24]，通过单方面设置对手的期望收益，从而获得比对手高的敲诈收益。然而，零行列式策略可以提高矿工个体收益，却不能提高矿工合作概率。

2 自适应零行列式策略

本文提出自适应零行列式策略，采用“预测下一轮合作/背叛收益，选择取得高收益策略”的思想，结合时序差分增强学习方法（Temporal Difference Learning Method， TD（λ））[25]和零行列式策略（Zero-Determinant strategy，ZD）预测下一轮收益，通过DMP选择策略，并改变下一轮的合作和背叛概率。通过迭代执行自适应零行列式策略，促进矿池相互合作，同时取得比对手高的收益，达到提高区块吞吐量、增加系统整体收益的目的。

設TD（λ）为时序差分学习方法，ZD为零行列式策略，DMP为策略选择并改变下一轮合作概率的过程。矿池上一次选择策略之后，到下一次作出选择的过程称为一轮，用t表示轮次，t = 1，2，…， N。图2描述的是第t轮中，AZD策略预测第t+1轮收益的整体架构。

表1描述了矿池不同策略下的收益。表中：C代表矿池选择合作策略，D代表矿池选择背叛策略。当两个矿池均选择合作时，双方的收益均为R;当两个矿池均选择背叛时，双方的收益均为P;当一个矿池选择合作，另一个矿池选择背叛时，合作矿池的收益为S，背叛矿池的收益为T，其中T>R>P>S，且2R>T+S。

3 实验模拟与仿真结果

为了测试自适应零行列式策略的有效性，实验模拟了无标度网络环境下，AZD矿池与普通矿池的合作演变情况。同时设计对照实验，分别设置AZD矿池的初始化合作概率为0.1，0.3，0.5，0.7，0.9，以测试不同初始化合作概率条件下，最终合作概率收敛为1时所需的迭代轮数。本文假设矿池之间进行20轮博弈，首先预测AZD矿池每一轮迭代后的合作收益与背叛收益值;其次，对AZD策略与自适应策略作出对比，并比较AZD策略与自适应策略合作概率收敛到1所需轮数;最后，对AZD策略与零行列式策略作出对比，在相同的对手收益下，比较两种策略的收益。

图3反映了在不同的初始化合作概率条件下，每一轮决策后的合作概率变化情况。如图所示，采用AZD策略的矿池的合作概率随迭代轮数的增加呈现整体上升趋势，同时，由于每一轮合作概率增加是渐变的过程，初始合作概率（P）越小，概率收敛为1所需轮数越多;反之，初始合作概率越大，概率收敛为1所需轮数就越少。

AZD矿池与采用自适应策略矿池的合作概率收敛所需轮数如图4所示。不管初始合作概率设置为多少，本文提出的基于AZD策略的算法最终合作概率收敛为1所需轮数均比自适应策略少1～3轮，比自适应策略有更好的表现。

矿池收益随轮数变化的情况如图5所示，AZD矿池合作的收益较矿池背叛的收益有明显的优势，从而吸引矿池执行合作策略。当合作概率较小时，由式（15）可知，自适应策略得到的Tcp （t+1）较小，由于AZD矿池采用动态敲诈因子χ=10/PC，由式（14）可知，零行列式策略得到的敲诈收益较大，由于零行列式策略在促进收益方面具有优势，由式（17）可知，合作收益Scp （t+1）较大;随着对DMP过程的执行，合作概率增加，自适应策略得到的Tcp （t+1）略微增大，零行列式策略得到的敲诈收益却明显减小，因此合作收益Scp （t+1）变小，从而验证图5最初收益上下波动的现象。当合作概率收敛为1后，矿池的收益稳定维持在较高水平。同时，由于合作概率收敛到1 后，敲诈因子稳定为10，由计算下一轮收益的公式得出：与不采用AZD策略的矿池收益相比，AZD矿池的平均收益提高了400%，图5也验证了此结果。

图6模拟在相同的对手收益下，AZD矿池与采用零行列式策略的矿池收益随轮数的变化情况。在决策前5轮，由于网络中存在合作概率较小的矿池，AZD矿池将获得较高敲诈型收益以保证自己的收入，因此图中显示本文采用的策略收益较高;5轮决策之后，网络中矿池合作概率均收敛为1，AZD策略不再以获取高收益为目的，而是倾向于保持合作，使收益保持在一个稳定水平。

4 结语

本文分析了比特币工作量证明机制中存在的囚徒困境问题，并借鉴博弈论中自适应方法与零行列式策略的思想优化了PoW共识机制，设计了自适应零行列式策略，采取“比较预期收益，选择收益大的策略”的思想，采用自适应与零行列式策略相结合的方法预测下一轮收益，通过DMP选择策略进一步改变下一轮的合作和背叛概率。迭代执行自适应零行列式策略，能够快速将合作概率收敛为1，即网络中矿池均采用合作策略，积极挖矿，获得高而稳定的收益。模拟实验验证了自适应零行列式策略的有效性，能够在较小的轮数内将合作概率收敛为1，同时获得比对手更高的收益。

参考文献（References）

[1] GIUNGATO P， RANA R， TARABELLA A， et al. Current trends in sustainability of bitcoins and related blockchain technology [J]. Sustainability， 2017， 9（12）： 1-11.

[2] 喻辉，张宗洋，刘建伟.比特币区块链扩容技术研究[J].计算机研究与发展，2017，54（10）：2390-2403.（YU H， ZHANG Z Y， LIU J W. Research on scaling technology of bitcoin blockchain [J]. Journal of Computer Research and Development， 2017， 54（10）： 2390-2403.）

[3] 王继业，高灵超，董爱强，等.基于区块链的数据安全共享网络体系研究[J].计算机研究与发展，2017，54（4）：742-749.（WANG J Y， GAO L C， DONG A Q， et al. Block chain based data security sharing network architecture research [J]. Journal of Computer Research and Development， 2017， 54（4）：742-749.）

[4] NAKAMOTO S. Bitcoin： a peer-to-peer electronic cash system [EB/OL]. （2008） [2018-05-03]. http：//bitcoin.org/bitcoin.pdf.

[5] DELGADO-SEGURA S， TANAS C， HERRERA-JOANCOMART J. Reputation and reward： two sides of the same bitcoin [J]. Sensors， 2016， 16（6）： 776.

[6] GARDNER-STEPHEN P. Escalating the war on SPAM through practical PoW exchange [C]// Proceedings of the 2007 15th IEEE International Conference on Networks. Piscataway， NJ： IEEE， 2007： 473-476.

[7] GOBEL J， KRZESINSKI A E. Increased block size and bitcoin blockchain dynamics [C]// Proceedings of the 2017 27th International Telecommunication Networks and Applications Conference. Piscataway， NJ： IEEE， 2017：1-6.

[8] COURTOIS N T， EMIRDAG P， WANG Z. On detection of bitcoin mining redirection attacks [C]// Proceedings of the 2015 International Conference on Information Systems Security and Privacy. Piscataway， NJ： IEEE， 2015： 98-105.

[9] VILIM M， DUWE H， KUMAR R. Approximate bitcoin mining [C]// Proceedings of the 53rd Annual Design Automation Conference. New York： ACM， 2016： Article No. 97.

[10] KRAFT D. Difficulty control for blockchain-based consensus systems [J]. Peer-to-Peer Networking and Applications， 2016， 9（2）： 397-413.

[11] LIU Y， CHEN X Y， ZHANG L， et al. An intelligent strategy to gain profit for bitcoin mining pools [C]// Proceedings of the 10th International Symposium on Computational Intelligence and Design. Piscataway， NJ： IEEE， 2017： 427-430.

[12] BAG S， RUJ S， SAKURAI K. Bitcoin block withholding attack： analysis and mitigation [J]. IEEE Transactions on Information Forensics and Security， 2017， 12（8）： 1967-1978.

[13] GBEL J， KRZESINSKI A E， KEELER H P， et al. Bitcoin blockchain dynamics： the selfish-mine strategy in the presence of propagation delay [J]. Performance Evaluation， 2016， 104： 23-41.

[14] LI J， KENDALL G， JOHN R. Computing Nash equilibria and evolutionarily stable states of evolutionary games [J]. IEEE Transactions on Evolutionary Computation， 2016， 20（3）： 460-469.

[15] EYAL I. The miner's dilemma [C]// Proceedings of the 2015 IEEE Symposium on Security and Privacy. Piscataway， NJ： IEEE， 2015： 89-103.

[16] KENTER F， MEIGS E. Iterated prisoner's dilemma with extortionate zero-determinant strategies and random-memory opponents [C]// Proceedings of the 2016 IEEE Congress on Evolutionary Computation. Piscataway， NJ： IEEE， 2016： 3499-3506.

[17] KOSTYUK N. The digital prisoner's dilemma： challenges and opportunities for cooperation [C]// Proceedings of the 2013 World Cyberspace Cooperation Summit Ⅳ. Piscataway， NJ： IEEE， 2015： 1-6.

KOSTYUK N. The digital prisoner's dilemma： challenges and opportunities for cooperation [EB/OL]. [2018-07-11]. http：//cybersummit.info/sites/cybersummit.info/files/The%20Digital%20Prisoner's%20Dilemma-Challenges%20and%20Opportunities%20for%20Cooperation_Nadiya%20Kostyuk%20.pdf.

[18] CARBONELL-NICOLAU O， McLEAN R P. On the existence of Nash equilibrium in Bayesian games [J]. GeomorphologyMathematics of Operations Research， 2017， 60（S1/2）：73-87.

[19] BARLOW L A. The impact of Agent size and number of rounds on cooperation in the iterated prisoner's dilemma [C]// Proceedings of 2014 IEEE Symposium on Foundations of Computational Intelligence. Piscataway， NJ： IEEE， 2014： 120-127.

[20] FEHL K， van der POST D J， SEMMANN D. Co-evolution of behaviour and social network structure promotes human cooperation [J]. Ecology Letters， 2011， 14（6）： 546-551.

[21] WANG J， SURI S， WATTS D J. Cooperation and assortativity with dynamic partner updating [J]. Proceedings of the National Academy of Sciences of the United States of America， 2012， 109（36）： 14363-14368.

[22] 李勇.重復囚徒困境模型中零行列式策略的研究[D].苏州：苏州大学，2015：1-50.（LI Y. Study on zero-determinant strategies in Iterated prisoner's dilemma game [D]. Suzhou： Soochow University， 2015： 1-50.）

[23] XUE L， SUN C Y， WUNSCH D， et al. An adaptive strategy via reinforcement learning for the prisoner's dilemma game [J]. IEEE/CAA Journal of Automatica Sinica， 2018， 5（1）： 1-10.

[24] PRESS W H， DYSON F J. Iterated prisoner's dilemma contains strategies that dominate any evolutionary opponent [J]. Proceedings of the National Academy of Sciences of the United States of America， 2012， 109（26）： 10409-10413.

[25] BABA N， TAKAGAWARA Y. Application of temporal difference learning method to computer gaming [C]// Proceedings of the 1998 2nd International Conference on Knowledge-based Intelligent Electronic Systems. Piscataway， NJ： IEEE， 1998： 36-39.