SAUCE
Home
Events
Listing
Future
Previous
Accelerated Computing with GPUs 2020
Data Mining - Winter 20/21
High Performance Computing 2019
Einführung in die Bioinformatik WS19/20
Computational Logic
Parallel Algorithms and Architectures 2019
DSEA Praktikum 2018/19
Deep Learning 2018
High Performance Computing 2018
Parallel Algorithms and Architectures 2018
Datenstrukturen und effiziente Algorithmen Ws 18/19
EiP SoSe 18
bio-st-18
EiP WS 2017/18
High Performance Computing 2017
Datenstrukturen und effiziente Algorithmen WiSe 17/18
PS SS 2017
Einfuehrung in die Programmierung SS17
Parallel Algorithms and Architectures 2017
High Performance Computing 2016
DSEA 2016/17
EiP WS2016/17
Parallel Algorithms and Architectures 2016
PS SS 2016
Krypto SS 2016
EiP SS 2016
DSEA Praktikum WS 2015/16
DSEA WS 2015/16
News
Documentation
About
Changelog
Roadmap
Deutsche Dokumentation
Tips and Tricks
Test configuration
Language information
Contact
Login
Parallel Algorithms and Architectures 2017
Pair Programming
Pair Programming
Sheet 3 (Quaternion Normalization, Matrix Transposition)
Sheet 4 (Array Reversal, Determinants)
Sheet 5 (Prefix Scan, Knapsack)
Sheet 6 (Piecewise constant kernels, FFT-Convolution)
Sheet 7 (Sparse Matrices, Page Rank)
Sheet 8 (Streams, Multi-GPU)
Sheet 9 (Jacobi Iteration)
Lecture 3 (Vector Addition)
Lecture 3 (Vector Addition)
Lecture 4 (Matrix Multiplication)
Lecture 5 (Prefix Scan)
Lecture 6 (1D Convolution)
Lecture 7 (SpMV/ELL)
Lecture 8 (Streams)
Lecture 9 (CUDA-aware MPI)
Lecture 10 (Parallel Merge)
Lecture 3 (Vector Addition)
Assignment
Scaffold Head
#include<iostream> // cout, endl #include<algorithm> // iota, fill #include<omp.h> // benchmark below (mutli-threading with openMP pragmas) /////////////////////////////////////////////////////////////////////////////// // IGNORE THIS HELPERS (taken from https://github.com/gravitino/cudahelpers) /////////////////////////////////////////////////////////////////////////////// // safe division #define SDIV(x,y)(((x)+(y)-1)/(y)) // error makro #define CUERR { \ cudaError_t err; \ if ((err = cudaGetLastError()) != cudaSuccess) { \ std::cout << "CUDA error: " << cudaGetErrorString(err) << " : " \ << __FILE__ << ", line " << __LINE__ << std::endl; \ exit(1); \ } \ } // convenient timers #define TIMERSTART(label) \ cudaEvent_t start##label, stop##label; \ float time##label; \ cudaEventCreate(&start##label); \ cudaEventCreate(&stop##label); \ cudaEventRecord(start##label, 0); #define TIMERSTOP(label) \ cudaEventRecord(stop##label, 0); \ cudaEventSynchronize(stop##label); \ cudaEventElapsedTime(&time##label, start##label, stop##label); \ std::cout << "#" << time##label \ << " ms (" << #label << ")" << std::endl;
Scaffold Foot
/////////////////////////////////////////////////////////////////////////// // BENCHMARKS AND CHECKS (you may ignore this, especially the openMP part) /////////////////////////////////////////////////////////////////////////// // check for correct result computed by CUDA for (size_t index = 0; index < N; index++) { if (c[index] != a[index]+b[index]) { std::cout << "error at position " << index << std::endl; break; } } // measure time for vector addition on single-threaded host TIMERSTART(overallSingleCore) for (size_t index = 0; index < N; index++) c[index] = a[index]+b[index]; TIMERSTOP(overallSingleCore) // measure time for vector addition on multi-threaded host TIMERSTART(overallMultiCore) # pragma omp parallel for for (size_t index = 0; index < N; index++) c[index] = a[index]+b[index]; TIMERSTOP(overallMultiCore) // get rid of the memory cudaFree(A); cudaFree(B); cudaFree(C); cudaFreeHost(a); cudaFreeHost(b); cudaFreeHost(c); // print status float usedMem = 3.0*N*sizeof(float)/(1L<<30); std::cout << "#processed " << usedMem << " gigabytes." << std::endl; std::cout << "CUDA programming is fun!" << std::endl; }
Start time:
Mo 24 Apr 2017 11:59:00
End time:
So 01 Okt 2017 00:00:00
General test timeout:
10.0 seconds
Tests
Comment prefix
#
Given input
Expected output
CUDA programming is fun!