[Deep Learning Theory Team Seminar] Talk by Prof. Difan Zou (HKU) on Understanding the Working Mechanism of Transformers
イベント説明
Venue: Online and the Open Space at the RIKEN AIP Nihonbashi office
Language: English
Title: Understanding the Working Mechanism of Transformers: Model Depth and Multi-head Attention
Speaker: Prof. Difan Zou, HKU, https://difanzou.github.io/
Abstract:
In this talk, I will discuss our recent works on the working mechanism of the Transformer architecture, including the learning capabilities and limitations of model depth and the multi-head attention mechanism in different tasks. Specifically, in the first part of the talk, we designed a series of learning tasks based on actual sequences and systematically evaluated the performance and limitations of Transformers of different depths in terms of memory, reasoning, generalization, and context generalization capabilities. We have demonstrated that a Transformer with single-layer attention performs excellently in memory tasks but cannot complete more complex tasks. In addition, at least a two-layer Transformer is required to achieve reasoning and generalization capabilities, while context generalization capabilities may require a three-layer Transformer to be achieved.
In the second part of the talk, considering the sparse linear regression problem, we explored the role of the multi-head attention of the Transformer model (after training) and revealed the working mechanism of multi-head attention on different Transformer layers. Firstly, we found in experiments that each attention head in the first layer of the Transformer is very important for the final performance, while in subsequent Transformer layers usually only one attention head plays an important role. We further proposed a preprocess-then-optimize working mechanism and theoretically proved that a multi-layer Transformer (multiple heads in the first layer and only one head in subsequent layers) can implement this mechanism. Moreover, in the sparse linear regression problem, we further proved the superiority of this mechanism compared to the naive gradient descent and ridge regression algorithms, which is consistent with the experimental findings. These research results help to deeply understand the advantages of multi-head attention and the role of model depth, providing a new perspective for revealing more complex mechanisms inside the Transformer.
Bio: Dr.Difan Zou is an assistant professor in computer science department and institute of data science at HKU. He has received his PhD degree in Department of Computer Science, University of California, Los Angeles (UCLA). His research interests are broadly in machine learning, deep learning theory, graph learning, and interdisciplinary research between AI and other subjects. His research is published in top-tier machine learning conferences (ICML, NeurIPS, COLT, ICLR) and journal papers (IEEE Trans., Nature Comm., PNAS, etc.). He serves as an area chair/senior PC member for NeurIPS and AAAI, and PC members for ICML, ICLR, COLT, etc.
開催日
2024年11月20日14:00 ~ 2024年11月20日15:00
主催者・問い合わせ先
RIKEN AIP Public
開催場所
項目 | 内容 |
---|---|
場所 | 名称未設定 |
住所 | Online and the Open Space at the RIKEN AIP Nihonbashi office |
開催場所の地図
SNS・Bookmark
近隣のイベント
- 2018年3月6日 - 副業解禁朝活ナビ
- 2018年3月5日 - 武禅 〜CMBトレーニング〜
- 2018年3月5日 - キレイになってあの人を振り向かせましょう!ベーシックスキンケア講座
- 2018年3月5日 - 3/5 目黒のスタバで朝活やります! (月曜・お茶代のみ) 【東京都】
- 2018年3月5日 - 【20代限定】あなたの未来は環境と習慣で決まる 東京 朝活 カフェ会
- 2018年3月5日 - 【20代限定】あなたの未来は環境と習慣で決まる 東京 朝活 カフェ会
- 2018年3月5日 - 副業解禁朝活ナビ
- 2018年3月4日 - 3月4日(日)六本木【恋人探しOnly】不動の人気企画Gaitomo国際交流パーティー
- 2018年3月4日 - 3/4 テコンドー練習会 (体験歓迎) ~新しい自分に出会い、可能性を見つけよう~ 【東京都】
近隣の場所 (直線距離)
- RED° TOKYO TOWER SKY STADIUM (3.7km)
- 串屋松吉 (3.5km)
- 成瀬ヨーガグループ (7.6km)
- 社食エリア (7.6km)
- 東京都千代田区 サピアタワー内 ステーションカンファレンス東京 503-C 会議室 (561m)
- Spaces Shinagawa (6.8km)
- シエンプレ株式会社 (7.1km)
- 野田焼売店 紀尾井本店 (3.3km)
- 新宿御苑アート貸し会議室 (5.7km)
- 博多もつ処 煌梨 目黒店 (7.5km)
- ウルシステムズ株式会社 (3km)
- BLINK 六本木 (5km)
- Spaces Shinagawa (6.8km)
- 渋谷ヒカリエカンファレンス (7km)
- 株式会社サーバーワークス 東京オフィスANNEX (3.6km)
- xBridge-Kyobashi (919m)
- 渋谷・表参道周辺(集合場所:宇田川町ビルディング3階) (7.3km)
- WITH HARAJUKU HALL (6.6km)
- MAMEHICO銀座 (1.2km)