hpce / hpce-2017-cw5 Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 6.0 93 KB

C++ 99.29% Makefile 0.71%

hpce-2017-cw5's People

Contributors

Stargazers

Watchers

Forkers

helicopter88 vinaymaniam fyquah malharjajoo junshern cmpoon

hpce-2017-cw5's Issues

GPU vs CPU double-precision in gaussian_blur

Hi,

I am aware that there have been a few issues on this topic previously, but they don't
seem to provide a conclusion. I have tried using the openCL compilation flags mentioned in ( #35 )
but they don't seem to improve the results.

I have a sequential implementation ( that works perfectly ) and thought a similar approach might work on GPU but the reference/CPU and GPU results differ by a lot ( >> 2).

Am, I missing something really obvious ?

Formula for gaussian_blur

Should the formula (below) have a 2 * r * r in the denominator of the exponential ?

heat_world bug?

In the heat_world puzzle, at the end of the ReferenceExecute function, the instruction buffer[index] = res; is missing after res=std::min(1.0f, std::max(-1.0f, res));.

Can something executed in parallel be slower than a sequential version (with a slight change in code logic)?

Hi,

I have a more subjective question that came about during my work on one of the puzzles. Where I was able to make a problem execute in parallel instead of sequential without any 'known' overhead I take on after my conversion of the block of code. It takes around 3 minutes to do the exact same thing in parallel against a 1 minute execution time in the sequential version. I would love to be more specific about the problem but that would go against the requirements of this assignment. Again can something done in parallel which is pretty much what is done in the sequential version be slower?

Platform: Mac OS X (using the CPU right now)

other tidbits to the problem: user time is 1m38 seconds and real time is 24.083 seconds. (for the parallel version) with a 12 second real time and 10 second user time in the sequential version.

Creating new classes in provider

We are planning on moving all common GPU-related code to a separate class in a separate file, then call the necessary functions from all puzzle files.
For the testing on AWS, will all files be copied from the provider directory, so is the approach described above feasible?

Top level Makefile for TBB

Can we modify the top level Makefile to have -ltbb?

Caught exception : clEnqueueReadBuffer

I am running the Gaussian Blur at GPU,

Everything works fine until the read, here is my read command

queue.enqueueReadBuffer(buffOutput, CL_TRUE, 0, cbBuffer, &outputPixels[0] );

and declaration of outputPixels is as follows

std::vector<uint8_t> outputPixels(w*h, 0);

and cbBuffer is defined as

size_t cbBuffer=w*h;

when queue.enqueueReadBuffer is executed it raises a exception : clEnqueueReadBuffer, do you have any idea why this might happen?

Mining scale does not represent the difficulty of the problem.

For the mining puzzle, for scales around 50000 or 60000, the puzzle is completed basically instantly. I was just wondering if you are aware of this. Also what if there are values of scale that make the problem unsolvable?

fetch_and_add operation for tbb::atomic<double>

I have a simple loop which accumulates the result from each loop into a variable, something like:

int acc = 0;
for (...) { acc += something; }

Now I want to parallelize the loop, so I make the accumulation variable atomic, like so:

tbb::atomic<int> acc = 0;
parfor (...) { acc += something; }

Which works fine, as we have seen in CW4. However, when I try to do the same thing for an atomic<double> accumulator:

tbb::atomic<double> acc = 0;
parfor (...) { acc += something; }

The compilation fails with complaints about "no operator '+=' for atomic<double>". I looked it up and found this StackOverflow issue which mentions that indeed, atomic<double> does not support certain operations such as ++, +=, etc.

I tried getting around the problem by writing out the += equivalent in two steps

tbb::atomic<double> acc = 0;
parfor (...) { 
double tmp = acc;
acc = tmp + something; 
}

Which compiles, but fails the test for correctness. Note that the sequential version of this exact same thing does pass correctness.

So I am wondering

Why does this sort of fetch and add (+=) operation work sequentially but not parallel-y, even though the accumulator is atomic?
Is there any way I can get around this? (I also tried using other atomic types which do support the fetch and add, such as int_64 to replace double, but expectedly this fails correctness even for the sequential version).

Thanks in advance.

[Feature] Count Us In Results to only include correct ones

In the code we are given, there is a script (bin/run_puzzle) that tests if the output of the reference solution is the same as our solution [run_puzzle], 1511188501.78, 2, Output is correct, which gives us some confidence that our "faster" implementation is correct.

In the README, it states that in the "Count Us In" tests: "These tests do not check for correctness"

I was wondering if we can make it so that the incorrect ones are not counted towards the results in the graphs (i.e. incorrect implementations will not contribute to the min/max/median). This is so that we can have an idea of what the median time is out of the correct solutions. If that's already the case, please ignore this issue.

Thank you.

Testing input sizes (and distributions)

Hi,

I was wondering if it is possible to publish approximate sizes of testing inputs or at least some statistics about the test set (median and quantiles), so we know what amount of work we will be dealing with in our algorithms.

Thanks!

OpenCL enqueueReadBuffer not copying back to host memory

I am implementing Mining and Random Projection in OpenCL. For random projection i was notified that output is incorrect, and for mining it never achieves a solution.

Upon further inspection, I found that there appears to be a problem occurring with memory transfer either to or from the gpu. More specifically, I tested this by initialising my c++ vector to all 0xFFFFFFFFFFFFFFFFull,
and unconditionally set the value to 0 in the OpenCL kernel.

std::vector<uint64_t> vOut(N_PARALLEL, 0xFFFFFFFFFFFFFFFFull);
queue.enqueueWriteBuffer(buffResult, CL_TRUE, 0, hpce_result, &vOut[0]);
 .
 .
 queue.enqueueNDRangeKernel(kernel, offset, globalSize, localSize); 
 .
 .
queue.enqueueReadBuffer(buffResult, CL_TRUE, 0, hpce_result, &vOut[0]);

I have been quite careful in following the steps outlined in coursework 3, and the sizes allocated for each buffer are correct(8 * number of elements in vector for uint64_t), so I was wondering if the problem was to do with something non code related.

I copied the opencl_sdk folder from coursework 3 and made the following modifications to the makefile:

LDFLAGS += -Lopencl_sdk/lib/windows/x86_64
LDLIBS += -ltbb -lOpenCL

Any insight would be greatly appreciated

How to actually run the puzzles

Errr slightly embarrassing question...

It's not immediately clear to me on how to actually run the compiled puzzles after going through the readme - I keep getting seg-faults on the provided implementation which I guess means I'm just using them wrong...

Could someone provide some example use cases of the puzzles please??

Thanks! :)

Unknown exception crash of OpenCL

I have the following simple gaussian blur provider which during the execute function just tries to setup the GPU as we did in cw3 (it is pretty much the copy paste code). This works fine for my partner's Mac, but silently crashes with in my windows (using the canned mingw provided). It crashes when context is initialised:

Running it with gdb I can see it is caused by an unknown exception, and the backtrace has led me to find that I might not be running the provided opencl_sdk/lib/windows/x86_64/ library copied from cw3 but the following library /c/WINDOWS/SYSTEM32/OpenCL.dll

The relevant part of my makefile looks like:

and results in the following compilation:

I have also tried to force the search path of the libraries to be set only to the location of the provided library by adding LDFLAGS += -Wl,-rpath,opencl_sdk/lib/windows/x86_64, resulting in the same result.

g2.2xlarge instance not launching in aws

I get that when I try to launch the instance. Anyone getting the same? I already followed their advice and asked to raise the limit but what if that doesn't happen in due time.

makefile issues

A small part of the makefile was probably not updated for this year:

no input file is specified for bin/compare_puzzle_output when the target is serenity_now_% - this causes segfaults
the serenity_now target refers to puzzles from last year

These changes fix the issues on my side:

serenity_now_% : all
	mkdir -p w
	bin/run_puzzle $* 100 2
	bin/create_puzzle_input $* 100 2 > w/$*.in
	cat w/$*.in | bin/execute_puzzle 1 2 > w/$*.ref.out
	cat w/$*.in | bin/execute_puzzle 0 2 > w/$*.got.out
	bin/compare_puzzle_output w/$*.in w/$*.ref.out w/$*.got.out 2

serenity_now : $(foreach x,mining heat_world gaussian_blur edit_distance random_projection hold_time,serenity_now_$(x))

Performace Graphs

I dont really understand the performance graphs. What are all the dotted lines? Are they other peoples runs? If so, what does the min/max mean?

Also, what is the time scale? I'm pretty sure it isnt milliseconds, because I imagine that no-one is hitting nanoseconds at any input scale.

Modifying path_delay for hold_time

Are we supposed/allowed to modify path_delay? Or can we only modify the ReferenceExecute part of it?

edit_distance: Caught exception: std::bad_alloc

Running edit_distance for scaling>15000 throws a std::bad_alloc exception.

build command:
make bin/run_puzzle

execution command
bin/run_puzzle 20000 2

edit_distance.hpp
reference version

Gaussian Blur CreateInput bug

It looks like the CreateInput function of the Gaussian Blur puzzle only creates vectors where all values are 0. On line 162 of gaussian_blur.hpp:

t=std::min(0.0f, std::max(1.0f, t));

This can be fixed by swapping the min and max functions.

Memory Efficiency

I just wanted to double check that the only performance metric was speed, and memory efficiency was not important?

Possible typo ?

In readme.md
" If a puzzle runs too long and is terminated, it does not count as being incorrect."

@m8pple , could you kindly clarify this ?

Call function from one class in another class

How can I call the Execute function in heat_wolrd_v2.cpp, from heat_world.cpp and in the same way that ReferenceExecute is being called ?

Issue with mining threshold always set to 0

The problem was that the mining puzzle was never actually completing. After debugging (with print statements), it all came down to the threshold being set at 0. I thought that 2^64 could be overflowing the integer, and changing the initialisation of the threshold first to 2^63 fixed the problem. A final solution is the following:

Changing this line (145 in mining.hpp):
params->threshold=(uint64_t)(std::pow(2.0, 64) / (scale*scale));
To:
params->threshold=(uint64_t)(std::numeric_limits<uint64_t>::max() / (scale*scale));
Fixed the issue.

Lack of compilation/performance runs

I've been having great fun with the auto tests over the past few days, where certain implementations
were managing to (I think) exhaust memory on the machine, which then freezes the machine. I'm not
quite sure how this is happening (there is an outer memory jail in place), and haven't been able
to diagnose it or strengthen or add internal memory jails enough to stop it happening. Another
possibility is that the GPU driver is somehow taking out the machine - this happens a fair amount on
consumer devices, but I haven't seen the AWS GPUs take out the machine before.

Anyway, I've fallen back on just adding logic to not run any implementation where a test run
has already started, then every time the machine freezes just start the script from just after
the place it left off. This may take a while, as sometimes the machine gets into a mode where
you can't stop it from the Amazon console (yay - watch the money burn on a machine you can't
log into!), so you have to terminate then rebuild.

Gaussian Blur Constraints

Can I confirm with someone that there is an acceptable margin of error when validating the output of the blurring algorithm?

This snippet is taken from the reference gaussian_blur.hpp:

In other words, will the final grading for gaussian_blur be as tolerant as the comparison above?

edit_distance test crashes 1/5 times when outputs equal

When running the examples in https://github.com/HPCE/hpce-2017-cw5/issues/7 with edit_distance, I get Outputs are equal. 4 times out of 5.
1 out of 5 times I get:

LogLevel = 2 -> 2 [run_puzzle], 1510598186.04, 2, Created log. [run_puzzle], 1510598186.04, 2, Creating random input ./script/edit_distance_test.sh: line 4: 31352 Floating point exception(core dumped) bin/create_puzzle_input edit_distance 5 2 > w/input.bin Caught exception : StdoutStream::Recv - End of file. Caught exception : StdoutStream::Recv - End of file. LogLevel = 2 -> 2 [execute_puzzle], 1510598186.21, 2, Created log. [execute_puzzle], 1510598186.21, 2, Loading input w/input.bin Caught exception : FileInStream::Recv - Not all data was recieved, m_offset=0, todo=4, errno=0

I know this error could be ignored but I am curious to know how it could be fixed ?

Apparently in file_in_stream.hpp, the Recv function ends because the return from the _read function is 0 or negative (signifying an error).
On msdn I see:
If the function tries to read at end of file, it returns 0. If fd is invalid, the file is not open for reading, or the file is locked, the invalid parameter handler is invoked, as described in Parameter Validation. If execution is allowed to continue, the function returns –1 and sets errno to EBADF.
fd is the 1st input to the read function (read(m_fdRecv, pRead, todo)).
However I am not sure how/when fd is invalid 1 times out of 5.

Makefile: make all on MacOS - "library not found for -lrt"

On my Mac, I run make all, then I get the following error:

$ make all 
mkdir -p bin
c++ -std=c++11 -W -Wall  -g -O3 -I include -o bin/execute_puzzle src/execute_puzzle.cpp lib/libpuzzler.a  -lrt -Llib -lpuzzler
ld: library not found for -lrt
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [bin/execute_puzzle] Error 1

I searched the error online and some sources say that Mac doesn't support lrt.

Makefile:

ifeq ($(OS),Windows_NT)
LDLIBS += -lws2_32
else
LDLIBS += -lrt
endif

Although, if I remove the line with LDLIBS += -lrt then it compiles on my Mac. I was wondering if anyone else was experiencing similar problems?

Gaussian Blur: GPU precision problems

An openCL kernel has been running on both the CPU and the GPU of the same machine.
CPU output is consistently correct while GPU output mostly fails the correctness test.

This has led me to believe that there might be a precision issue when using a GPU as the output is only slightly wrong.
The problem is that we already use double precision for everything.

Anyone have any ideas?

EDIT: It would seem that adding -cl-fp32-correctly-rounded-divide-sqrt flag to the kernel compiler fixes this problem. This seems a bit weird as the fp32 in the flag would seem to indicate that this only applies to normal floats... I found the above flag from here

random openCL WriteBuffer Exception

Hi,

I keep getting a random openCL writeBuffer exception for edit_distance puzzle. I have tried experimenting at various scales ( from 10 to 10,000 ) but don't see a pattern in the exception.

Has someone faced a similar issue ?

Is make serenity_now unstable?

I am working on the gaussian_blur.hpp puzzle. After doing some modifications in the code I run make serenity_now (after doing make all ) and I get the following results:

run_puzzle reports that output is correct and then compare_puzzle reports that outputs are different.

I then re-run make serenity_now without re-building and I get the following results:

run_puzzle reports that output is correct and then compare_puzzle reports that outputs are correct.

So my question is how is it possible for one run of the test to show that outputs are different and the next run to show that they are correct?

I was expecting the result to be wrong since I deliberately parallelised over a loop that should lead to a race condition:

x = 0; 
par_for i=0:N{
contribution = some calculation;
x += contribution
}

Unexpected behaviour of compare outputs?

So I am using

make serenity_now_gaussian_blur

To run and test correctness of my modified version of the gaussian_blur execute function. For this, I have created a ExecuteV2 function which is selected through an environment variable.

As a sanity check, I set the acc variable to zero right before the output line, expecting to see it fail the comparison test.

	void ExecuteV2(
		puzzler::ILog *log,
		const puzzler::GaussianBlurInput *pInput,
		puzzler::GaussianBlurOutput *pOutput
		) const
	{
		log->LogInfo("I am V2, I AM A HUGE MISTAKE");
		pOutput->pixels.resize(pInput->width * pInput->height);

		for(int xOut=0; xOut < (int)pInput->width; xOut++){
			log->LogVerbose("column = %u", xOut);
			for(int yOut=0; yOut < (int)pInput->height; yOut++){
				log->LogDebug("row = %u", yOut);

				double acc=0.0;
				for(int xIn=0; xIn < (int)pInput->width; xIn++){
					for(int yIn=0; yIn < (int)pInput->height; yIn++){
						double contrib = hashedCoeff(xIn-xOut, yIn-yOut, pInput->radius);
						acc += contrib * pInput->pixels[yIn*pInput->width+xIn];
					}
				}

				if(acc<0){
					acc=0;
				}else if(acc>255){
					acc=255;
				}

				acc=0; // INTENTIONALLY MAKING A HUGE MISTAKE, PLEASE REMOVE LATER
				pOutput->pixels[ pInput->width*yOut+xOut ] = (uint8_t)acc;
			}
		}
	}

To my surprise, it passes the test!

bin/run_puzzle gaussian_blur 100 2
LogLevel = 2 -> 2
[run_puzzle], 1510947330.43, 2, Created log.
[run_puzzle], 1510947330.43, 2, Creating random input
[run_puzzle], 1510947330.43, 2, Executing puzzle
[run_puzzle], 1510947330.43, 2, I am V2, I AM A HUGE MISTAKE
[run_puzzle], 1510947330.43, 2, Executing reference
[run_puzzle], 1510947341.20, 2, Checking output
[run_puzzle], 1510947341.20, 2, Output is correct
bin/create_puzzle_input gaussian_blur 100 2 > w/gaussian_blur.in
LogLevel = 2 -> 2
[run_puzzle], 1510947341.20, 2, Created log.
[run_puzzle], 1510947341.20, 2, Creating random input
[run_puzzle], 1510947341.20, 2, Writing data to stdout
cat w/gaussian_blur.in | bin/execute_puzzle 1 2 > w/gaussian_blur.ref.out
[execute_puzzle], 1510947341.21, 2, Created log.
[execute_puzzle], 1510947341.21, 2, Loaded input, puzzle=gaussian_blur
[execute_puzzle], 1510947341.21, 2, Begin reference
[execute_puzzle], 1510947352.14, 2, Finished reference
cat w/gaussian_blur.in | bin/execute_puzzle 0 2 > w/gaussian_blur.got.out
[execute_puzzle], 1510947352.14, 2, Created log.
[execute_puzzle], 1510947352.14, 2, Loaded input, puzzle=gaussian_blur
[execute_puzzle], 1510947352.14, 2, Begin execution
[execute_puzzle], 1510947352.14, 2, I am V2, I AM A HUGE MISTAKE
[execute_puzzle], 1510947352.14, 2, Finished execution
bin/compare_puzzle_output w/gaussian_blur.in w/gaussian_blur.ref.out w/gaussian_blur.got.out 2
LogLevel = 2 -> 2
[execute_puzzle], 1510947352.15, 2, Created log.
[execute_puzzle], 1510947352.15, 2, Loading input w/gaussian_blur.in
[execute_puzzle], 1510947352.15, 2, Creating puzzle gaussian_blur to match input
[execute_puzzle], 1510947352.15, 2, Loading reference w/gaussian_blur.ref.out
[execute_puzzle], 1510947352.15, 2, Loading got w/gaussian_blur.got.out
[execute_puzzle], 1510947352.15, 2, Outputs are equal.

I tried setting acc to a different value, for example 1234, which DOES fail the test as expected.

bin/run_puzzle gaussian_blur 100 2
LogLevel = 2 -> 2
[run_puzzle], 1510947063.43, 2, Created log.
[run_puzzle], 1510947063.43, 2, Creating random input
[run_puzzle], 1510947063.43, 2, Executing puzzle
[run_puzzle], 1510947063.43, 2, I am V2, I AM A HUGE MISTAKE
[run_puzzle], 1510947063.43, 2, Executing reference
[run_puzzle], 1510947074.04, 2, Checking output
[run_puzzle], 1510947074.04, 0, Output is not correct.
makefile:26: recipe for target 'serenity_now_gaussian_blur' failed
make: *** [serenity_now_gaussian_blur] Error 1

Is there a mistake somewhere or did I accidentally stumble upon the secret trick to this puzzle?

target platform

According to the readme our code will be benchmarked on a g2.2xlarge instance. It occurs to me that this one is not listed in AWS. It could be that it has been replaced (or am I not looking in the right place) If it has been replaced, what would be the designated platform this year? Or, where can I find and launch the g2.2xlarge?

OpenCL and Random Projection

I'm having trouble with my implementation of kernel code to speed up the random projection puzzle.

Having put the contents of MakeProjection in the kernel (including for loops), I managed to obtain the correct output, with n=200.

However this implementation is slow. I'm not sure how to address this issue here, so I'll address another one.

When running the puzzle with n>500 (just a test value), the program aborts. I believe this is due to the buffer size, 4*n*n, exceeding the maximum buffer size. Our CL library doesn't contain a call to GetDeviceInfo, so I can't confirm this. I haven't seen others getting this issue, so I assume my implementation is wrong. My reason for using the buffer is to pass a reference to m, aka proj, to the kernel.

Can someone help me out?

Possible change of coeff() in gaussian_blur

Hi,

@m8pple - Can we assume that the definition of this function -

 double coeff(int dx, int dy, double r) const
    {
      return exp(- (dx*dx+dy*dy) / (2*r) ) / (2 * 3.1415926535897932384626433832795 * r);
    }

will remain the same when our code is tested. One simple reason may be that our implementation may not call this function, but something similar in order to achieve our goal. If the definition of this function were to change, then it would render my custom function incorrect.

Thanks!

random_projection output is just zeros

It seems that for an input bigger than 16, random_projection produces output which is just a vector filled with zeros. Is this expected? In this case are we expected to check the correctness using small inputs and then evaluate the speedup for large inputs?

Performance benchmark plot y-axis

Regarding the plots of individual puzzle execution performance, is the y-axis really in milliseconds?
As of now it looks like large problems are getting solved in matters of microseconds.

Is the y-axis scale arbitrary?

AWS: OpenCL SELECT_PLATFORM

I was wondering about how we should control the choice of platform+device when running on the g2.2xlarge. I currently use some environment variables (with the surprising cw3 names HPCE_SELECT_PLATFORM etc). Is that supported (will the benchmark make the best choice? And how do I know?). Or should test myself on the AMI and then hardcode my choices?

I would like to hear your advise since I think option one can be tricky and I don't like the latter idea either (portability).

tbb does not execute all of its tasks in edit_distance

I run this optimised code for edit_distance:

`
tbb::parallel_for( start, end, [&](int j) {

     i =  end2 - (j - start);  //end2 defined outside tbb

     if( s[i-1] == t[j-1] ){ 
        d(i,j) = d(i-1,j-1); 
      }else{
        d(i,j) = 1 + std::min(std::min( d(i-1,j), d(i,j-1) ), d(i-1,j-1) ); 
      }

} );

With a scale from 1 to 20, the reference and optimised always match.
From 20 onwards, the outputs are not correct sometimes.

When I look into why, it seems some random indexes in the middle of the table have the value '0' (whereas they are surrounded by positive numbers only). An example for scale 20:

This could happen if tbb had not executed this task because I did not lay out my iteration space properly.

However when I look into the list of indexes tbb went through, the '0' value indexes are there, which means tbb run the task for that value of (i,j) but did not assign d(i,j) a value with the if/else.

How is that possible ?
Why would a tbb task terminate before doing all it is supposed to do ?
Should I use a more primitive form of tbb to have more control ?

OS X opencl kernel run time compile errors not being thrown

Has anybody had the issue on an os x machine where the open cl compiler doesn't throw compiler errors when an incorrect kernel is passed in? Right now I can change the kernel to be completely incorrect (for instance not declaring a variable that's used) and running it doesn't throw any compiler errors and the program runs incorrectly.

I've tried changing the kernel to one that I know before hand is correct and running it and the program executes as expected, which should mean that the kernel is being read and compiled.

Large variance in execution time for mining

While working on the mining puzzle, I noticed that the performance of mining tends to vary by quite a lot.
So I tried timing 5 runs of the reference execution with the exact same input:

The reference execution can be as fast as 2 seconds to as slow as 17 seconds in this case. I suspect this is because the seed value for the RNG is different for each execution, as seen in std::mt19937_64 iter(time(0)); . Are we expected to improve the original code such that our improvement is significantly better even though its result may be variable as well? Or are we expected to measure the average execution time of our programme?

Gaussian Blur calculation

It seems that the reference execute function of the gaussian blur puzzle only takes into account the first column of the input image:

acc += contrib * pInput->pixels[yIn*pInput->width];

Consequently, the output image should be a blurred repetition of the first column - am I missing something here?

Edit Distance Input Generator

Shouldn't there be a guard against division by zero / -1 here? (unit length or zero length strings)

https://github.com/HPCE/hpce-2017-cw5/blob/master/include/puzzler/puzzles/edit_distance.hpp#L155

Development: use a dev branch or v1..v2..v3 ?

When optimising a puzzle, is it better to create v1, v2, v3 etc in the same directory like in CW3 or is it better to keep one version of the puzzle in the directory (provider) and use git to submit different versions ('dev branch') on top of each other ?
To use a dev branch, what is the procedure ?
To use v1,v2,v3, where should the existence of the new cpp files be added ? I can't find a file in 'provider' where the different files/function can be listed (like the factory in cw3).

OpenCL kernel outputs zeros when using GPU but not CPU

For the Random Projection puzzle, I've used openCL to perform some of the puzzle and the code performs correctly when the CPU runs the code but always fails when used on the GPU (the output becomes zero for all values).

I believe I have found that the issue is that a non-zero parameter (of type double) which is being passed to the kernel always equals zero within the kernel function. This is the case even when the value being passed is explicitly set to a non-zero value. For example, when I test the program with double p=100.0; and I pass p to the kernel using kernel.setArg(0, p);, the value of p becomes 0 when using the GPU but remains 100 when using the CPU. Does anyone know why the GPU is setting the value of p to zero when using the GPU?

Again, just to be clear, the openCL code works correctly when the CPU device is selected but not when the GPU device is.

random_projection division bug?

In random_projection, the reference solution contains the line
double p=16/n;

where n is an int. This will round to 0 when n > 16, at least on my machine. Hence, the output for this puzzle becomes all 0. Is this correct behaviour?

separation 1D

I am struggling to find u and v from S (knowing S = u * v, S is squared and u and v are 1D)

Any help on how to do this ?

[Comedy] Not an issue, but I found this funny

Regarding Modification of Functions

I understand that we are only supposed to change Execute, but is it possible for us to modify or override functions used within the execute function? For instance, MakeProjection() in random projection.

Is the makefile out of date?

In the top level directory, the makefile contains the following build target:

serenity_now : $(foreach x,julia ising_spin logic_sim random_walk,serenity_now_$(x))

This would build and execute the reference and our versions for puzzles, "julia", "ising_spin", etc. Looking at the 2016 version, these were the puzzles used in that. Does this mean the makefile may be out of date?

hpce / hpce-2017-cw5 Goto Github PK

hpce-2017-cw5's People

Contributors

Stargazers

Watchers

Forkers

hpce-2017-cw5's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs