SAUCE
Home
Events
Listing
Future
Previous
Accelerated Computing with GPUs 2020
Data Mining - Winter 20/21
High Performance Computing 2019
Einführung in die Bioinformatik WS19/20
Computational Logic
Parallel Algorithms and Architectures 2019
DSEA Praktikum 2018/19
Deep Learning 2018
High Performance Computing 2018
Parallel Algorithms and Architectures 2018
Datenstrukturen und effiziente Algorithmen Ws 18/19
EiP SoSe 18
bio-st-18
EiP WS 2017/18
High Performance Computing 2017
Datenstrukturen und effiziente Algorithmen WiSe 17/18
PS SS 2017
Einfuehrung in die Programmierung SS17
Parallel Algorithms and Architectures 2017
High Performance Computing 2016
DSEA 2016/17
EiP WS2016/17
Parallel Algorithms and Architectures 2016
PS SS 2016
Krypto SS 2016
EiP SS 2016
DSEA Praktikum WS 2015/16
DSEA WS 2015/16
News
Documentation
About
Changelog
Roadmap
Deutsche Dokumentation
Tips and Tricks
Test configuration
Language information
Contact
Login
High Performance Computing 2018
Pair Programming
Pair Programming
Sheet 2 (AVX Shuffles, Instruction Parallelism)
Sheet 3 (Stochastic PI, Shallow Deep Learning)
Sheet 4 (Max-Pooling, Asynchronous 2D Jacobi Partitioning)
Sheet 5 (std::async, block-cyclic distribution)
Sheet 6 (Atomics, Knapsack)
Sheet 7 (Sorting, Riemann Zeta)
Sheet 8 (Data Dependencies, Task Parallelism)
Sheet 9 (Reverse-Engineering MPI, SUMMA)
Sheet 11 (Position Based Dynamics)
Sheet 12: Lockfree Hashmaps
Lecture 3: AVX SOA normalization
Lecture 3: AVX SOA normalization
Lecture 4: MPI PI
Lecture 5: Asynchronous 1D Jacobi Partitioning
Lecture 6: Thread distributions MVM
Lecture 7: Dynamic Schedule of All-Pairs distance computation
Lecture 8: 1NN classifier on MNIST data
Lecture 9: Backward Substitution
Lecture 10: Interleaved SUMMA
Lecture 11: Kepler Orbits
Lecture 12: Lockfree List using an Array
Lecture 12: Lockfree Hashmap
Lecture 3: AVX SOA normalization
Assignment
Scaffold Head
#include <random> // prng support #include <cstdint> // uint32_t #include <iostream> // std::cout #include <immintrin.h> // AVX intrinsics // timers distributed with this book #include "include/hpc_helpers.hpp" // if we do not have fmad (AVX2), we pretend that we have it #ifndef __AVX2__ #define _mm256_fmadd_ps legacy_fmad __m256 legacy_fmad (__m256 x, __m256 y, __m256 z) { return _mm256_add_ps(_mm256_mul_ps(x, y), z); } #endif // memory aligned arrays can be used as usual void init(float * x, float * y, float * z, uint64_t length) { std::mt19937 engine(42); std::uniform_real_distribution<float> dist(-1.0, +1.0); for (uint64_t i = 0; i < length; i++) { x[i] = dist(engine); y[i] = dist(engine); z[i] = dist(engine); } } void check(float * x, float * y, float * z, uint64_t length) { for (uint64_t i = 0; i < length; i++) { const float rho = x[i]*x[i]+y[i]*y[i]+z[i]*z[i]; if ((rho-1)*(rho-1) > 1E-6) { std::cout << "ERROR: at position " << i << " with norm " << std::sqrt(rho) << std::endl; break; } } } void plain_soa_norm(float * x, float * y, float * z, uint64_t length) { for (uint64_t i = 0; i < length; i++) { const float rho = x[i]*x[i]+y[i]*y[i]+z[i]*z[i]; const float irho = float(1)/std::sqrt(rho); x[i] *= irho; y[i] *= irho; z[i] *= irho; } }
Scaffold Foot
int main () { const uint64_t num_entries = 1UL << 26; const uint64_t num_bytes = num_entries*sizeof(float); TIMERSTART(alloc_memory) auto x = static_cast<float*>(_mm_malloc(num_bytes , 32)); auto y = static_cast<float*>(_mm_malloc(num_bytes , 32)); auto z = static_cast<float*>(_mm_malloc(num_bytes , 32)); TIMERSTOP(alloc_memory) TIMERSTART(plain_init) init(x, y, z, num_entries); TIMERSTOP(plain_init) TIMERSTART(plain_soa_norm) plain_soa_norm(x, y, z, num_entries); TIMERSTOP(plain_soa_norm) TIMERSTART(plain_check) check(x, y, z, num_entries); TIMERSTOP(plain_check) TIMERSTART(avx_init) init(x, y, z, num_entries); TIMERSTOP(avx_init) TIMERSTART(avx_soa_norm) avx_soa_norm(x, y, z, num_entries); TIMERSTOP(avx_soa_norm) TIMERSTART(avx_check) check(x, y, z, num_entries); TIMERSTOP(avx_check) TIMERSTART(free_memory) _mm_free(x); _mm_free(y); _mm_free(z); TIMERSTOP(free_memory) std::cout << "Parallel programming is fun!" << std::endl; }
Start time:
Mo 22 Okt 2018 10:51:00
End time:
Mo 01 Apr 2019 12:00:00
General test timeout:
10.0 seconds
Tests
Comment prefix
#
Given input
Expected output
Parallel programming is fun!