uw madison cs 544 big data systems
Welcome to Intro to Big Data Systems! We'll deploy and use distributed systems to store and analyze large datasets. Unstructured and structured approaches to storage will be covered. Analysis will involve learning new query languages, processing streaming data, and training machine learning models. Systems covered include Docker, PyTorch, HDFS, Spark, Cassandra, Kafka, and more.
Deploy distributed systems for data storage and analytics Demonstrate competencies with tools and processes necessary for loading data into distributed storage systems Write programs that use distributed platforms to efficiently analyze large datasets Produce meaning from large datasets by training machine learning models in parallel or on distributed systems Measure resource usage and overall cost of running distributed programs Optimize distributed analytics programs to reduce resource consumption and program runtime Demonstrate competencies with cloud services designed to store or analyze large datasets