なんとなく最近のコンピュータアーキテクチャ系の研究傾向が知りたいので、学会のサーベイをしてみることにした。
まずはISCA (International Symposium of Computer Architecture) から。2021年のプログラムから、Abstractを引き抜いて何となく傾向をつかんでみる。 日本語で概要をコメントする。後で追記する。 アーキテクチャはプリフェッチ、ベクトルなどの高速化が多いかな。 メモリはセキュリティとかコヒーレンシとかが多い気がする。
ISCA International Symposium on Computer Architecture
- Industry Track (6)
- Microarchitecture (6)
- Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-Level Cache Evictions
- LLCのコヒーレント制御に関して
- Exploiting Page Table Locality for Agile TLB Prefetching
- TLBプリフェッチの高速化
- A Cost-Effective Entangling Prefetcher for Instructions
- 高利付な命令プリフェッチャ
- Vector Runahead
- 投機的なメモリロードについて
- Unlimited Vector Extension with Data Streaming Support
- 新しいタイプのベクトル命令の提案
- Speculative Vectorisation with Selective Replay
- セレクティブリプレイによる投機的ベクトル化
- Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-Level Cache Evictions
- Memory (10)
- Don't Forget the I/O When Allocating Your LLC
- LLCの性能解析
- PF-DRAM: A Precharge-Free DRAM Structure
- 物理回路向け。プリチャージフリーのDRAM
- Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
- CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations
- NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM
- NVMに頻繁に永続的なスナップショットを取得し、後でランダムにアクセスできるようにするためのスケーラブルで効率的な技術。
- Rebooting Virtual Memory with Midgard
- Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent Replication
- キャッシュコヒーレントなNUMAシステムにおいて、データブロックを2つの異なるソケットに複製するハードウェア駆動型の複製機構
- Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications
- プログラムをプロファイル化して、プログラムのコンテキストを用いて置き換えポリシーの基礎に効率の良い置き換え決定を通知する。
- Quantifying Server Memory Frequency Margin and Using It to Improve Performance in HPC Systems
- 汎用サーバ用メモリモジュールの周波数マージンを特徴付ける初の公開研究を実施した。
- Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution
- ハードウェア自動嵌合型MoS(Memory-over-Storage)ソリューションであるHAMSを提案。
- Don't Forget the I/O When Allocating Your LLC
- Machine Learning (7)
- RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference
- REDUCT: Keep It Close, Keep It Cool! - Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute
- Communication Algorithm-Architecture Co-Design for Distributed Deep Learning
- SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations
- ELSA: Hardware-Software Co-Design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks
- Cambricon-Q: A Hybrid Architecture for Efficient Training
- TENET: A Framework for Modeling Tensor Dataflow Based on Relation-Centric Notation
- Processing in/near Memory (4)
- ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-Based Near-Memory Processing with Inter-DIMM Broadcast
- Sieve: Scalable In-Situ DRAM-Based Accelerator Designs for Massively Parallel k-mer Matching
- FORMS: Fine-Grained Polarized ReRAM-Based In-Situ Computation for Mixed-Signal DNN Accelerator
- BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class Memory
- Data Center (4)
- SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains
- Confidential Serverless Made Efficient with Plug-In Enclaves
- Flex: High-Availability Datacenters with Zero Reserved Power
- BlockMaestro: Enabling Programmer-Transparent Task-Based Execution in GPU Systems
- Security (3)
- Accelerator (10)
- Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees
- Aurochs: An Architecture for Dataflow Threads
- PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture
- Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
- CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
- η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities
- NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators
- SNAFU: An Ultra-Low-Power, Energy-Minimal CGRA-Generation Framework and Architecture
- SARA: Scaling a Reconfigurable Dataflow Accelerator
- HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
- Compiler (4)
- Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures
- Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach
- Execution Dependence Extension (EDE): ISA Support for Eliminating Fences
- Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters
- Graph Processing (4)
- FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining
- PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators
- Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses
- Low Temperature (4)
- Cost-Efficient Overclocking in Immersion-Cooled Datacenters
- CryoGuard: A Near Refresh-Free Robust DRAM Design for Cryogenic Computing
- Superconducting Computing with Alternating Logic Elements
- Failure Sentinels: Ubiquitous Just-in-Time Intermittent Computation via Low-Cost Hardware Support for Voltage Monitoring
- Network Storage Acceleration (3)
- Quantum / Photonics (4)
- Exploiting Long Distance Interactions and Tolerating Atom Loss in Neutral Atom Quantum Architectures
- Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors
- Designing Calibration and Expressivity-Efficient Instruction Sets for Quantum Computing
- Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics
- Reliability & Security (4)
- IntroSpectre: A Pre-Silicon Framework for Discovery and Analysis of Transient Execution Vulnerabilities
- Maya: Using Formal Control to Obfuscate Power Side Channels
- Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers
- No-FAT: Architectural Support for Low Overhead Memory Safety Checks
- DRAM / IO / Network (3)
- Sparse Processing (7)
- Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems
- IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors
- ZeRØ: Zero-Overhead Resilient Operation Under Pointer Integrity Attacks
- SpZip: Architectural Support for Effective Data Compression In Irregular Applications
- Dual-Side Sparse Tensor Core
- RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging
- GoSPA: An Energy-Efficient High-Performance Globally Optimized SParse Convolutional Neural Network Accelerator