SAUCE
Home
Events
Listing
Future
Previous
Accelerated Computing with GPUs 2020
Data Mining - Winter 20/21
High Performance Computing 2019
Einführung in die Bioinformatik WS19/20
Computational Logic
Parallel Algorithms and Architectures 2019
DSEA Praktikum 2018/19
Deep Learning 2018
High Performance Computing 2018
Parallel Algorithms and Architectures 2018
Datenstrukturen und effiziente Algorithmen Ws 18/19
EiP SoSe 18
bio-st-18
EiP WS 2017/18
High Performance Computing 2017
Datenstrukturen und effiziente Algorithmen WiSe 17/18
PS SS 2017
Einfuehrung in die Programmierung SS17
Parallel Algorithms and Architectures 2017
High Performance Computing 2016
DSEA 2016/17
EiP WS2016/17
Parallel Algorithms and Architectures 2016
PS SS 2016
Krypto SS 2016
EiP SS 2016
DSEA Praktikum WS 2015/16
DSEA WS 2015/16
News
Documentation
About
Changelog
Roadmap
Deutsche Dokumentation
Tips and Tricks
Test configuration
Language information
Contact
Login
Parallel Algorithms and Architectures 2018
Interactive Programming Exercises in the Lecture
Interactive Programming Exercises in the Lecture
Sheet 3
Sheet 4 (Array Reversal, Determinants)
Sheet 5 (Knapsack)
Sheet 6 (Warp Intrinsics)
Sheet 7 (Sparse Matrices, Page Rank)
Sheet 8 (Streams, Multi-GPU)
Sheet 9 (Jacobi Iteration)
Lecture 4: Covariance Matrix
Lecture 3: Vector Addition
Lecture 4: Covariance Matrix
Lecture 5: Dynamic Time Warping
Lecture 6: Prefix Scan
Lecture 7: SpMV/ELL
Lecture 8: Streamed Covariance Matrix Computation
Lecture 9: MPI Matrix Multiplication
Lecture 10: Parallel Merge
Lecture 4: Covariance Matrix
Assignment
Scaffold Head
#include "hpc_helpers.hpp" #include "binary_IO.hpp" template < typename index_t, typename value_t> __global__ void compute_mean_kernel( value_t * Data, value_t * Mean, index_t num_entries, index_t num_features) { auto thid = blockDim.x*blockIdx.x + threadIdx.x; if (thid < num_features) { value_t accum = 0; # pragma unroll 32 for (index_t entry = 0; entry < num_entries; entry++) accum += Data[entry*num_features+thid]; Mean[thid] = accum/num_entries; } } template < typename index_t, typename value_t> __global__ void correction_kernel( value_t * Data, value_t * Mean, index_t num_entries, index_t num_features) { auto thid = blockDim.x*blockIdx.x + threadIdx.x; if (thid < num_features) { value_t value = Mean[thid]; for (index_t entry = 0; entry < num_entries; entry++) Data[entry*num_features+thid] -= value; } }
Scaffold Foot
int main (int argc, char * argv[]) { // set the identifier of the used CUDA device cudaSetDevice(0); cudaDeviceReset(); // 202599 grayscale images each of shape 55 x 45 constexpr uint64_t imgs = 202599, rows = 55, cols = 45; // pointer for data matrix and mean vector float * data = nullptr, * cov = nullptr; cudaMallocHost(&data, sizeof(float)*imgs*rows*cols); CUERR cudaMallocHost(&cov, sizeof(float)*rows*cols*rows*cols); CUERR // allocate storage on GPU float * Data = nullptr, * Mean = nullptr, * Cov = nullptr; cudaMalloc(&Data, sizeof(float)*imgs*rows*cols); CUERR cudaMalloc(&Mean, sizeof(float)*rows*cols); CUERR cudaMalloc(&Cov, sizeof(float)*rows*cols*rows*cols); CUERR // load data matrix from disk TIMERSTART(read_data_from_disk) auto file_name = "celebA_gray_lowres.202599_55_45_32.bin"; load_binary(data, imgs*rows*cols, file_name); TIMERSTOP(read_data_from_disk) // copy data to device and reset Mean TIMERSTART(data_H2D) cudaMemcpy(Data, data, sizeof(float)*imgs*rows*cols, cudaMemcpyHostToDevice); CUERR cudaMemset(Mean, 0, sizeof(float)*rows*cols); CUERR TIMERSTOP(data_H2D) // compute mean TIMERSTART(compute_mean_kernel) compute_mean_kernel<<<SDIV(rows*cols, 1024), 1024>>> (Data, Mean, imgs, rows*cols); CUERR TIMERSTOP(compute_mean_kernel) // correct mean TIMERSTART(correction_kernel) correction_kernel<<<SDIV(rows*cols, 1024), 1024>>> (Data, Mean, imgs, rows*cols); CUERR TIMERSTOP(correction_kernel) // compute covariance matrix TIMERSTART(covariance_kernel) dim3 blocks(SDIV(rows*cols, 8), SDIV(rows*cols, 8)); dim3 threads(8, 8, 1); shared_covariance_kernel<<<blocks, threads>>> (Data, Cov, imgs, rows*cols); CUERR TIMERSTOP(covariance_kernel) // transfer covariance back to host TIMERSTART(cov_D2H) cudaMemcpy(cov, Cov, sizeof(float)*rows*cols*rows*cols, cudaMemcpyDeviceToHost); CUERR TIMERSTOP(cov_D2H) // write mean image to disk float * cov_check = nullptr; cudaMallocHost(&cov_check, sizeof(float)*rows*cols*rows*cols); CUERR load_binary(cov_check, rows*cols*rows*cols, "celebA_covariance.bin"); for (uint64_t i = 0; i < rows*cols*rows*cols; i++) if ((cov[i]-cov_check[i])*(cov[i]-cov_check[i]) > 1E-4) { std::cout << "ERROR: at position " << i << " of the COV matrix" << std::endl; break; } // get rid of the memory cudaFreeHost(data); CUERR cudaFreeHost(cov); CUERR cudaFreeHost(cov_check); CUERR cudaFree(Data); CUERR cudaFree(Mean); CUERR cudaFree(Cov); CUERR std::cout << "CUDA programming is fun!" << std::endl; }
Start time:
Do 12 Apr 2018 16:11:00
End time:
Fr 01 Mär 2019 12:00:00
General test timeout:
10.0 seconds
Tests
Comment prefix
#
Given input
Expected output
CUDA programming is fun!