social practice> Types of section> Science and Technology Research Institute remote research project: computer big data direction
                            Science and Technology Research Institute remote research project: computer big data direction
                            Project Category: Science and Engineering Internship Research
                            Enrollment quota: 20 people
                            Suitable for groups: college students
                                                        Activity time: annual enrollment
register now
(Is there still a question before registration? Welcome to call customer service: 010-57952000)

project's venue:


Project purpose:

                                Through this course, students can develop deep-learning gravitational wave data processing tools and use the public cloud computing platform to process, analyze, mine and visualize data.

project cost:


First, the introduction

This internship program is designed for students who plan to apply for courses in computer, electronics, automation, systems science, or their cross-disciplinary disciplines. In the middle of the course, students will be exposed to all aspects of big data science through curriculum and self-study and will be used in specific project research. Through this course, students can develop deep-learning gravitational wave data processing tools and use the public cloud computing platform to process, analyze, mine and visualize data. The course is divided into theoretical study and practical test development. At the same time, students need good self-study and research ability. At that time, the instructor will arrange a large number of exploratory subjects for students to cultivate good scientific research. The project focuses on improving students' scientific problems, developing technical solutions, and developing scientific thinking. After the internship, the instructor will issue a recommendation letter based on the student's performance.

Second, the content

The purpose of this research internship is to provide students with a better understanding of the engineering research process. This project will provide students with a wealth of specific engineering research knowledge including big data science, deep learning, virtualization technology, container technology, NoSQL and so on. Students first understand the project background of gravitational wave data processing and analysis, and are familiar with a common development language and cloud computing data processing platform. Next, focus on the mainstream virtualization container in the past two years, and deploy the student-completed project to the virtual container, so that students can experience the advantages of a deployment and execution everywhere. Then, by hands-on practice of modern highly scalable unstructured databases and memory computing, students will experience how the latest and greatest data can bring convenience to gravitational wave data processing. Finally, by selecting a variety of cool and other modern visualization technologies, students will be able to see their own projects and other products successfully displayed as products.

Third, the teacher background

The instructor is a postdoctoral researcher at the Massachusetts Institute of Technology and a senior software architect at the Massachusetts Lab in the United States. In 2015, he was employed as a Distinguished Professor at a leading university in China. The space laboratory research team at the Massachusetts Institute of Technology, which has been working full-time for five years, is sensationalized by the discovery of gravitational waves in 2016. As one of the main leaders of the project computing platform group, he assumed

The entire gravitational wave detection big data high-performance computing platform is built, data analysis and the use of emerging virtual projects to provide computing support and other tasks.

Fourth, enrollment objects and requirements

Excellent undergraduate and graduate students, talented high school students, plans to apply for computer, electronic information, data analysis, applied mathematics and other related majors. In order to allow students to better complete research projects, the project team will screen students in the form of written tests and interviews.

V. Itinerary

The distance of the remote scientific research project is one month, and the specific time can be arranged according to the needs of the students and the progress of the project. The advantage of this project is that for students with long enough application time, the tutor can help students to complete one or more professional research tasks in a deeper, more comprehensive and systematic manner, and let students participate in the whole process of scientific research projects. Experience the sense of accomplishment in solving scientific problems, and at the same time let students know the background and frontier dynamics of the field. In addition to regular research project discussion courses, students can ask relevant questions at any time during the project cycle, and get professional guidance from the instructors, so that students can experience the real work and life status of a researcher in advance. The specific course schedule is as follows:

the first week Programming language platform and core tools to learn: Learn the basic theory of big data cloud computing, and understand the core issues of gravitational wave data processing. According to the programming experience of students, get started with Python, Node.js or Java, familiar with the use of Linux or MacOS platform, familiar with the use of Github managed code, and arrange students to learn the implementation and use of mainstream open source tools in the field of big data. On weekends, students will exchange questions and answer questions in the online study.
the second week Virtualization technology learning and practice: In-depth study of the realization principle of modern virtualization technology, including the realization mechanism of virtual machine and container technology, learning the realization principle of advanced container scheduling platform such as Mesos/Marathon or Kubernetes, and the file system in learning container technology Core technologies such as automatic capacity expansion, fault tolerance and disaster tolerance, health check, and RESTful API operation, arrange students to study classic documents and documents; and exchange questions and answers in the literature reading on weekends.
The third week Storage technology learning and practice: learning RDBMS, NoSQL, NewSQL and other core content, introducing non-relational database basic theory, various databases and other pros and cons, using Cassandra as a case to introduce mainstream and other NoSQL databases and other principles, using the first few weeks of learning Container technology, containerizing the Cassandra database; agreeing to communicate with students to build problems in the system and answer them.
the fourth week Processing and visualization technology learning and practice: starting from memory computing, stream computing, batch computing, etc., learning three mainstream programming models (MapReduce, Pregel, GraphLab) and corresponding data processing technologies Hadoop and Spark; learning modern data visualization technology Use D3 or Armcharts as a case to explain the use and programming of visual tools such as cool. Using the second week of container technology, we will package everything, such as this week's study, in a container for distribution. Appointment time and student communication control system problems encountered in the tuning and answer.