ECE 459: Programming for Performance
jzarnett / ece459 Goto Github PK
View Code? Open in Web Editor NEWECE 459: Programming for Performance
ECE 459: Programming for Performance
Hello! I'm enjoying the course so far. I was curious about what you would talk about when it came to Rust, so I looked a bit ahead. :)
Slide 29/38 in Lecture 35: Rust is out of date. It shows this example and says that it doesn't compile:
let mut s = String::from("one");
let r1 = &mut s;
let r2 = &mut s;
However, if you run this today, it will compile with no issues.
The reason for this is because about a year ago Rust added a feature called Non-Lexical Lifetimes (NLL). That makes it so that the lifetime of a reference is no longer only tied to the length of its scope. The lifetime is now tied to how long it is used (up to the end of its scope of course).
To make this code stop compiling, you have to mutate r1
. This extends its lifetime to past the declaration of r2
, which means that creating the mutable reference is r2
is now illegal.
let mut s = String::from("one");
let r1 = &mut s;
let r2 = &mut s;
// This line causes the code to stop compiling
r1.push_str(" two");
Some of the other slides also refer to references having to go out of scope in order for you to be able to create new &mut
references. This is no longer 100% true because of NLL.
Some resources on NLL if you're interested:
Assignment 3 gives some timings for 500 and 5000 points. Are these accurate, or should they be for 500 * 64 and 5000 * 64 points?
https://github.com/jzarnett/ece459/blob/master/lectures/L22.tex#L187
( as the calculation of forces is called from the host, and such global functions can call device functions but not other global functions. Device functions can call only other device functions. So it makes it clear where the entry points are from host code. In some OOP-sense, you could consider the device functions to be ``private'', not that I encourage you to think that way.
apollographql/router#3686
https://www.magiroux.com/rust-jemalloc-profiling/
https://github.com/tikv/jemallocator
Could be used in flipped note or lecture note.
https://github.com/jzarnett/ece459/blob/master/lectures/L10-slides.tex#L70
Would x = 2 be another possible outcome? x is read as 1, x is assigned 42, and then x is assigned 2.
lectures/live-coding/L16/rayon-max-array/src/main.rs uses:
vec.par_iter().for_each(|n| { let mut previous_value = max.load(Ordering::SeqCst); if *n > previous_value { while max.compare_and_swap(previous_value, *n, Ordering::SeqCst) != previous_value { println!("Compare and swap was unsuccessful; retrying"); previous_value = max.load(Ordering::SeqCst); } } });
which may have a race condition between previous_value and the new max; the code in L16.tex uses a loop (as recommended) instead:
loop { let old = max.load(Ordering::SeqCst); if *n <= old { break; } let returned = max.compare_and_swap(old, *n, Ordering::SeqCst); if returned == old { println!("Swapped {} for {}.", n, old); break; } }
Both code snippets were introduced in the same commit on October 14. Assigning to @jzarnett who wrote this code.
The hyperfine command in lab 1 manual seems to be missing single quotes around the command to be benchmarked. I believe it should be hyperfine -i 'target/release/lab1 verify 100.txt'
. I would open a pull request but the manual is only provided as a PDF.
in the grouping requests section:
since the logic to build a a larger request is almost certainly significantly more complex than the logic to build a small request.
duplicate "a"
Lecture 23 covers GPU password cracking, reduced-resource computation, and software transactional memory, but the slides and notes are only called "GPU Password Cracking". It might be a good idea to expand the title like you've done for other lectures
I think one of the big take-aways from this course is learning to use analysis tools. IMO, there are two parts to that:
This course is very useful and I really believe it'll help me in my career. I was wondering if the teaching team would consider compiling a list (more details the better) that would consolidate and succinctly describe the tools that we've learned about. It would also serve as an invaluable resource to future students taking this course.
Should be my_lock.compare_and_swap(false, true, Ordering::SeqCst) == true
instead of my_lock.compare_and_swap(false, true, Ordering::SeqCst) == false
Thanks to the anonymous student who reported it on Piazza.
I've run into numerous broken links already... Let's help each other keep track of correct links. I'll begin:
L25: http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html is broken.
L25: http://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/oss-performance-tools-183986.pdf is broken
L25: http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf is broken
L25: http://developer.amd.com/cpu/CodeAnalyst/assets/ISPASS2010_IBS_CA_abstract.pdf is broken
Found only some correct links:
http://developer.amd.com/wordpress/media/2012/10/AMD_IBS_paper_EN.pdf
student feedback: "The coding environment setup tutorial could be more robust though as it took significant time at the beginning and was discouraging way to start."
"As discussed, the CPU generates a memory address for a read or write operation. The address will be mapped to a page. Ideally, the page is found in the cache, because that would be faster. If the requested page is, in fact, in the cache, we call that a cache hit. If the page is not found in the cache, it is considered a cache miss. In case of a miss, we must load the page from memory, a comparatively slow operation. A page miss is also called a page fault."
Pages of virtual memory are 'cached' in page frames of physical main memory at page granularity, but that's done in software. The contents of physical main memory are in turn cached at line granularity by the L1/L2/... caches in hardware. I don't think that that's clear from this explanation, and calling a hardware cache miss a page fault is confusing.
Suggestion: inevitably
--> inevitably be
in L2 S17/32
Suggestion: the
--> Send
in L3 34/39
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.