題目:Unified continuous-time q-learning for mean-field game and mean-field control problems
報(bào)告人:魏曉利
時(shí)間:2025年11月20日(周四),晚上19:00-20:00
地點(diǎn):騰訊會(huì)議(會(huì)議號(hào):706282801)
英文摘要:In this talk, we study the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
中文摘要:在本次報(bào)告中,我們研究當(dāng)總體分布不可直接觀測(cè)時(shí),均值場(chǎng)跳躍-擴(kuò)散模型中的連續(xù)時(shí)間Q學(xué)習(xí)問題。從典型智能體視角出發(fā),我們提出解耦形式的集成Q函數(shù)(解耦I(lǐng)q函數(shù)),并建立其鞅刻畫定理,為均值場(chǎng)博弈和均值場(chǎng)控制問題提供了統(tǒng)一的策略評(píng)估準(zhǔn)則。此外,我們考慮典型智能體根據(jù)自身狀態(tài)值更新總體分布的學(xué)習(xí)流程。通過區(qū)分求解均值場(chǎng)博弈或均值場(chǎng)控制任務(wù),可差異化運(yùn)用解耦I(lǐng)q函數(shù)分別表征均值場(chǎng)均衡策略與均值場(chǎng)最優(yōu)策略。基于這些理論發(fā)現(xiàn),我們利用測(cè)試策略和平均鞅正交性條件,構(gòu)建了適用于均值場(chǎng)博弈與均值場(chǎng)控制問題的統(tǒng)一Q學(xué)習(xí)算法。針對(duì)跳躍-擴(kuò)散場(chǎng)景下的若干金融應(yīng)用,我們獲得了解耦I(lǐng)q函數(shù)與價(jià)值函數(shù)的精確參數(shù)化表示,并通過數(shù)值實(shí)驗(yàn)驗(yàn)證了該算法具有令人滿意的性能。
報(bào)告人簡介:魏曉利,哈爾濱工業(yè)大學(xué)副教授(準(zhǔn)聘)。本科畢業(yè)于中國科學(xué)技術(shù)大學(xué),2018年于巴黎第七大學(xué)獲得博士學(xué)位。2019-2021年在加州大學(xué)伯克利分校從事博士后。2021年-2023年就職于清華大學(xué)深圳國際研究生院。主要從事隨機(jī)微分博弈、強(qiáng)化學(xué)習(xí)等研究。論文發(fā)表在Operations Research,Mathematical Finance, SIAM Journal on Control and Optimization等期刊雜志。
中國·浙江 湖州市二環(huán)東路759號(hào)(313000) 浙ICP備10025412號(hào)
浙公網(wǎng)安備 33050202000195號(hào) 版權(quán)所有:黨委宣傳部