FPGA開発日記

カテゴリ別記事インデックス https://msyksphinz.github.io/github_pages , English Version https://fpgadevdiary.hatenadiary.com/

コンピュータアーキテクチャ系国際学会サーベイ (1. ISCA 続き)

なんとなく最近のコンピュータアーキテクチャ系の研究傾向が知りたいので、学会のサーベイをしてみることにした。

まずはISCA (International Symposium of Computer Architecture) から。2021年のプログラムから、Abstractを引き抜いて何となく傾向をつかんでみる。 日本語で概要をコメントする。後で追記する。 アーキテクチャはプリフェッチ、ベクトルなどの高速化が多いかな。 メモリはセキュリティとかコヒーレンシとかが多い気がする。

  • ISCA International Symposium on Computer Architecture
    • Industry Track (6)
    • Microarchitecture (6)
      • Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-Level Cache Evictions
        • LLCのコヒーレント制御に関して
      • Exploiting Page Table Locality for Agile TLB Prefetching
        • TLBプリフェッチの高速化
      • A Cost-Effective Entangling Prefetcher for Instructions
        • 高効率な命令プリフェッチャ
      • Vector Runahead
        • 投機的なメモリロードについて
      • Unlimited Vector Extension with Data Streaming Support
        • 新しいタイプのベクトル命令の提案
      • Speculative Vectorisation with Selective Replay
        • セレクティブリプレイによる投機的ベクトル化
    • Memory (10)
      • Don't Forget the I/O When Allocating Your LLC
        • LLCの性能解析
      • PF-DRAM: A Precharge-Free DRAM Structure
        • 物理回路向け。プリチャージフリーのDRAM
      • Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
      • CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations
        • 4つの従来固定されていたDRAM内部タイミングをきめ細かく制御できる新しい低コストDRAM基板、CODICを設計
      • NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM
        • NVMに頻繁に永続的なスナップショットを取得し、後でランダムにアクセスできるようにするためのスケーラブルで効率的な技術。
      • Rebooting Virtual Memory with Midgard
      • Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent Replication
        • キャッシュコヒーレントなNUMAシステムにおいて、データブロックを2つの異なるソケットに複製するハードウェア駆動型の複製機構
      • Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications
        • プログラムをプロファイル化して、プログラムのコンテキストを用いて置き換えポリシーの基礎に効率の良い置き換え決定を通知する。
      • Quantifying Server Memory Frequency Margin and Using It to Improve Performance in HPC Systems
        • 汎用サーバ用メモリモジュールの周波数マージンを特徴付ける初の公開研究を実施した。
      • Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution
        • ハードウェア自動嵌合型MoS(Memory-over-Storage)ソリューションであるHAMSを提案。
    • Machine Learning (7)
      • 「作ってみた」系が多い気がする
      • RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference
      • REDUCT: Keep It Close, Keep It Cool! - Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute
        • DNN推論能力に影響を与え、そのパフォーマンスを制限する従来のCPUリソースをバイパスする革新的なソリューションを構築する
      • Communication Algorithm-Architecture Co-Design for Distributed Deep Learning
        • 効率的でスケーラブルなall-reduce操作のために、トポロジとリソース使用率を認識したMultiTreeall-reduceアルゴリズムを提案
      • SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations
        • SPACEは、パーソナライズされた推奨事項のために、DIMMを備えたコンピューティング対応の3DスタックDRAMを活用します。
      • ELSA: Hardware-Software Co-Design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks
        • 自己注意メカニズムに費やされる実行時間とエネルギーを大幅に削減するためのハードウェアとソフトウェアの共同設計ソリューション。
      • Cambricon-Q: A Hybrid Architecture for Efficient Training
        • Cambricon-Qは、ASICアクセラレーションコアとニアデータプロセッシング(NDP)エンジンで構成されるハイブリッドアーキテクチャを備えています。
      • TENET: A Framework for Modeling Tensor Dataflow Based on Relation-Centric Notation
    • Processing in/near Memory (4)
      • ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-Based Near-Memory Processing with Inter-DIMM Broadcast
      • Sieve: Scalable In-Situ DRAM-Based Accelerator Designs for Massively Parallel k-mer Matching

      • FORMS: Fine-Grained Polarized ReRAM-Based In-Situ Computation for Mixed-Signal DNN Accelerator
      • BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class Memory
    • Data Center (4)
      • SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains
      • Confidential Serverless Made Efficient with Plug-In Enclaves
      • Flex: High-Availability Datacenters with Zero Reserved Power
      • BlockMaestro: Enabling Programmer-Transparent Task-Based Execution in GPU Systems
    • Security (3)
      • Opening Pandora's Box: A Systematic Study of New Ways Microarchitecture Can Leak Private Data
      • I See Dead μops: Leaking Secrets via Intel/AMD Micro-Op Caches
      • TimeCache: Using Time to Eliminate Cache Side Channels when Sharing Software
    • Accelerator (10)
      • Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees
      • Aurochs: An Architecture for Dataflow Threads
      • PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture
      • Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
      • CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
      • η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities
      • NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators
      • SNAFU: An Ultra-Low-Power, Energy-Minimal CGRA-Generation Framework and Architecture
      • SARA: Scaling a Reconfigurable Dataflow Accelerator
      • HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
    • Compiler (4)
      • Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures
      • Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach
      • Execution Dependence Extension (EDE): ISA Support for Eliminating Fences
      • Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters
    • Graph Processing (4)
      • FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining
      • PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators
      • Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses
    • Low Temperature (4)
      • Cost-Efficient Overclocking in Immersion-Cooled Datacenters
      • CryoGuard: A Near Refresh-Free Robust DRAM Design for Cryogenic Computing
      • Superconducting Computing with Alternating Logic Elements
      • Failure Sentinels: Ubiquitous Just-in-Time Intermittent Computation via Low-Cost Hardware Support for Voltage Monitoring
    • Network Storage Acceleration (3)
      • NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) Networks
      • NASA: Accelerating Neural Network Design with a NAS Processor
      • PMNet: In-Network Data Persistence
    • Quantum / Photonics (4)
      • Exploiting Long Distance Interactions and Tolerating Atom Loss in Neutral Atom Quantum Architectures
      • Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors
      • Designing Calibration and Expressivity-Efficient Instruction Sets for Quantum Computing
      • Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics
    • Reliability & Security (4)
      • IntroSpectre: A Pre-Silicon Framework for Discovery and Analysis of Transient Execution Vulnerabilities
      • Maya: Using Formal Control to Obfuscate Power Side Channels
      • Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers
      • No-FAT: Architectural Support for Low Overhead Memory Safety Checks
    • DRAM / IO / Network (3)
      • Ghost Routing to Enable Oblivious Computation on Memory-Centric Networks
      • QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips
      • A RISC-V In-Network Accelerator for Flexible High-Performance Low-Power Packet Processing
    • Sparse Processing (7)
      • Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems
      • IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors
      • ZeRØ: Zero-Overhead Resilient Operation Under Pointer Integrity Attacks
      • SpZip: Architectural Support for Effective Data Compression In Irregular Applications
      • Dual-Side Sparse Tensor Core
      • RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging
      • GoSPA: An Energy-Efficient High-Performance Globally Optimized SParse Convolutional Neural Network Accelerator