Methods in Open Science


The primary goal of this course is to explore methods that scientists and analysts use to process, explore, and analyze data; document, present, and communicate results; and generate useful knowledge from raw data using reproducible methods.

This course has several broad components. The more conceptual material is predominantly presented using web-based slides and modules. Annotations and text are provided throughout in the Notes section to the right of the slide. We also include some embedded videos, some interactive web content, links to web resources, references to publications, and book recommendations.

We have also provided materials associated with learning to code and analyze data in both the Python and R languages and computational environments. This material is intentionally redundant: you can complete the assignments in the language of your choosing or attempt them in both environments. In the world of data science, Python and R are both used, and many practitioners are bilingual. So, we have included examples in both languages. This material is presented as webpages with code examples and explanations, videos, examples, and assignments. Please see the sequencing document for our suggested order in which to work through the material. We have also provided PDF versions of the lectures with the notes included.

This course is meant for those who have no prior experience working with data analytics or coding. So, it is fine if these concepts are completely new to you. We have tried to focus on key concepts and hands-on applications. Once you have completed this course, you may want to take a deeper dive into some of the broad topics discussed.

After completing this course you will be able to:

  • explain the characteristics of data and the methods we use to extract useful information from them.
  • implement data cleaning, manipulation, and summarization to prepare raw data as input to additional analyses.
  • code at an intermediate level in the Python or R language/computational environment.
  • construct effective graphs and data visualizations.
  • execute and interpret statistical tests and assess the appropriateness of tests and input data for exploring a specific hypothesis.
  • critique modeling methods for making predictions and assess model output.
  • perform data science experiments to address specific questions using appropriate techniques and data.

This course was produced by West Virginia View (http://www.wvview.org/) with support from AmericaView (https://americaview.org/). This material is based upon work supported by the U.S. Geological Survey under Grant/Cooperative Agreement No. G18AP00077. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the U.S. Geological Survey. Mention of trade names or commercial products does not constitute their endorsement by the U.S. Geological Survey. This course and associated materials were also supported by the National Science Foundation (NSF) (Federal Award ID No. 2046059: “CAREER: Mapping Anthropocene Geomorphology with Deep Learning, Big Data Spatial Analytics, and LiDAR”).