UC Carpentries

September 15-25, 2025

8:30 am - 12:30 pm PST

Instructors: Ryan Horne (UCLA), Jose Niño Muriel (UCSB), Leigh Phan (UCLA), Jamie Jamison (UCLA), Derek Devnich (UCM), Reid Otsuji (UCSD), Cody Hennesy (UCB), Geoff Boushey (UCSF), Kaija Gahm (UCLA), Tim Dennis (UCLA), Julien Brun (UCSB), Jade Li (UCSB)

Helpers: Kristi Liu (UCSB), Hannah Sutherland (UCLA), Leigh Phan (UCLA), Misha Coleman (UCB), Geno Sanchez (UCLA), Reid Otsuji (UCSD), Alessandra Vidal Meza (UCSB), Kristian Allen (UCLA), Hind Al Ali (UCSB), Celeste Allaband (UCSD)

Register

To attend,please register here. You will receive a Zoom link for each session selected.

General Information

The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.

Want to learn more and stay engaged with The Carpentries? Carpentries Clippings is The Carpentries' biweekly newsletter, where we share community news, community job postings, and more. Sign up to receive future editions and read our full archive: https://carpentries.org/newsletter/

Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Online. Get directions with OpenStreetMap or Google Maps.

When: September 15-25, 2025; 8:30 am - 12:30 pm PST Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

We are dedicated to providing a positive and accessible learning environment for all. We do not require participants to provide documentation of disabilities or disclose any unnecessary personal information. However, we do want to help create an inclusive, accessible experience for all participants. We encourage you to share any information that would be helpful to make your Carpentries experience accessible. To request an accommodation for this workshop, please fill out the accommodation request form. If you have questions or need assistance with the accommodation form please email us.

Glosario is a multilingual glossary for computing and data science terms. The glossary helps learners attend workshops and use our lessons to make sense of computational and programming jargon written in English by offering it in their native language. Translating data science terms also provides a teaching tool for Carpentries Instructors to reduce barriers for their learners.

Workshop Recordings: Carpentries workshops are designed to be interactive rather than lecture-based, with lessons that build upon one another. To foster a positive online learning environment, we strongly recommend that participants join in real time. As a result, workshop recordings are not recommended and may not be available to learners.

Contact: Please email datascience+carpentries@ucla.edu for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Collaborative Notes

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

Day 1 (Tidy Data) —

Schedule for Day 1: Tidy Data on Mon, Sep 15, 2025
8:30 AM - 10:00 AMUsing Spreadsheet Programs for Data Organization
Formatting Data Tables in Spreadsheets
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMFormatting Problems
Dates as Data
11:30 AM - 12:20 PMBasic Quality Assurance and Control
Exporting Data from Spreadsheets
Caveats of Popular Data and File Formats

Day 2 (OpenRefine) —

Schedule for Day 2: OpenRefine on Tue, Sep 16, 2025
8:30 AM - 10:00 AMIntroduction to OpenRefine
Working with OpenRefine Projects
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMFaceting and Filtering Data
Clustering and Transformations
11:30 AM - 12:20 PMExporting and Saving Data
Advanced Data Operations

Day 3 (Unix/Command Line) —

Schedule for Day 3: Unix on Wed, Sep 17, 2025
8:30 AM - 10:00 AMIntroducing the Shell
Navigating Files and Directories
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMWorking With Files and Directories
Pipes and Filters
11:30 AM - 12:20 PMLoops
Shell Scripts
Finding Things

Day 4 (Git) —

Schedule for Day 4: Git on Thu, Sep 18, 2025
8:30 AM - 10:00 AMNavigating Git Repositories
Recording Changes to Files
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMViewing the History of Changes
Undoing Changes
11:30 AM - 12:20 PMWorking with Remotes
Collaborating with Others

Day 5 (SQL) —

Schedule for Day 5: SQL on Fri, Sep 19, 2025
8:30 AM - 10:00 AMUsing Databases and SQL
Selecting Data
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMFiltering Data
Calculating New Values
11:30 AM - 12:20 PMAggregating Data
Combining Data

Day 6 (Python: Part 1) —

Schedule for Day 6: Python: Part 1 on Mon, Sep 22, 2025
8:30 AM - 10:00 AMRunning and Quitting
Variables and Assignment
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMData Types and Type Conversion
Built-in Functions and Help
11:30 AM - 12:20 PMConditionals
Loops

Day 7 (Python: Part 2) —

Schedule for Day 7: Python: Part 2 on Tue, Sep 23, 2025
8:30 AM - 10:00 AMFunctions
Libraries
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMReading Tabular Data into DataFrames
Pandas DataFrames
11:30 AM - 12:20 PMPlotting

Day 8 (R: Part 1) —

Schedule for Day 8: R: Part 1 on Wed, Sep 24, 2025
8:30 AM - 10:00 AMIntroduction to R and RStudio
Project Management With RStudio
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMSeeking Help
Data Structures
11:30 AM - 12:20 PMExploring Data Frames
Subsetting Data

Day 9 (R: Part 2) —

Schedule for Day 9: R: Part 2 on Thu, Sep 25, 2025
8:30 AM - 10:00 AMVectorization
Dataframe Manipulation with dplyr
10:00 AM - 10:10 AMBreak
10:10 AM - 11:30 AMCreating Publication-Quality Graphics with ggplot2
Producing Reports With knitr
11:30 AM - 12:20 PM

Setup

To participate in a Software Carpentry workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Spreadsheet Software

To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric, OpenOffice.org, or other programs. Commands may differ a bit between programs, but general ideas for thinking about spreadsheets is the same.

For this lesson, if you don't have a spreadsheet program already, we will be using Excel, but you can use Google Sheets as well.

  • Download the Installer: Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don't need to make one. Your download should begin automatically.
  • Install LibreOffice: Once the installer is downloaded, double click on it and it should install.

  • Download the Installer: Install LibreOffice by going to the installation page. The version for Mac OS should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don't need to make one. Your download should begin automatically.
  • Install LibreOffice: The file LibreOffice\_X.X.X\_MacOS\_x86-64 (whichever version of LibreOffice you have selected) should have been downloaded. Double click on this file, and LibreOffice will be installed.
  • Download the Installer: Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don't need to make one. Your download should begin automatically.
  • Install LibreOffice: Once the installer is downloaded, double click on it and it should install.
  • OpenRefine

    For this lesson you will need OpenRefine and a web browser. Note: this is a Java program that runs on your machine (not in the cloud). It runs inside a web browser, but no web connection is needed.

    1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.
    2. Download software from http://openrefine.org/
    3. Create a new directory called OpenRefine.
    4. Unzip the downloaded file into the OpenRefine directory by right-clicking and selecting "Extract ...".
    5. Go to your newly created OpenRefine directory.
    6. Launch OpenRefine by clicking openrefine.exe (this will launch a command prompt window, but you can ignore that - just wait for OpenRefine to open in the browser).
    7. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
    1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It may not run correctly in Safari.
    2. Download software from http://openrefine.org/.
    3. Create a new directory called OpenRefine.
    4. Unzip the downloaded file into the OpenRefine directory by double-clicking it.
    5. Go to your newly created OpenRefine directory.
    6. Launch OpenRefine by dragging the icon into the Applications folder.
    7. Use Ctrl-click/Open ... to launch it.
    8. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
    1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser.
    2. Download software from http://openrefine.org/.
    3. Make a directory called OpenRefine.
    4. Unzip the downloaded file into the OpenRefine directory.
    5. Go to your newly created OpenRefine directory.
    6. Launch OpenRefine by entering ./refine into the terminal within the OpenRefine directory.
    7. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.

    SQLite

    SQL is a specialized programming language used with databases. You have two options for SQL: you can use a database manager called SQLite or install DB Browser for SQLite.

    • Open "Git Bash" from the Start menu
    • Copy the following curl -fsSL https://jmjamison.github.io/2025-09-15-UC/getsql.sh | bash
    • Paste it into the window that Git Bash opened. If you're unsure, ask an instructor for help
    • You should see something like 3.50.2 2025-06-28...

    If you want to do this manually, download sqlite3, make a bin directory in the user's home directory, unzip sqlite3, move it into the bin directory, and then add the bin directory to the path.

    If you have a previous Anaconda Distribution sqlite is preinstalled. You can check by:

    • open an Anaconda Prompt and typing sqlite3
    • you should see SQLite version 3.50.2 2025-06-28 14:00:48

    SQLite comes pre-installed on macOS.

    SQLite comes pre-installed on Linux.

    DB Browser for SQlite

    DB Browser for SQLite is a visual tool to create, edit, and query SQLite databases.SQLite is included with DB Browser for SQLite, so it does not have to be installed separately.

    • There are a few options for Windows, but most modern computers can use the Standard installer for 64-bit Windows version. The .zip (no installer) can be run directly from the folder, after extracting the contents of the zip file. It will not show up in the Start menu.
    • There are also two options for MacOS. Most people should use the DB Browser for SQLite (Universal) installer.
    Launch DB Browser for SQLite to confirm that the installation was successful.

    The Bash Shell

    Bash is a commonly-used shell that gives you the power to do tasks more quickly.

    Please install Git for Windows using the instructions below.

    1. Download the Git for Windows installer.
    2. Run the installer and follow the steps below:
      1. Click on "Next" four times (two times if you've previously installed Git). You don't need to change anything in the Information, location, components, and start menu screens.
      2. From the dropdown menu, "Choosing the default editor used by Git", select "Use the Nano editor by default" (NOTE: you will need to scroll up to find it) and click on "Next".
      3. On the page that says "Adjusting the name of the initial branch in new repositories", ensure that "Let Git decide" is selected. This will ensure the highest level of compatibility for our lessons.
      4. Ensure that "Git from the command line and also from 3rd-party software" is selected and click on "Next". (If you don't do this Git Bash will not work properly, requiring you to remove the Git Bash installation, re-run the installer and to select the "Git from the command line and also from 3rd-party software" option.)
      5. Select "Use bundled OpenSSH".
      6. Ensure that "Use the native Windows Secure Channel Library" is selected and click on "Next".
      7. Ensure that "Checkout Windows-style, commit Unix-style line endings" is selected and click on "Next".
      8. Ensure that "Use Windows' default console window" is selected and click on "Next".
      9. Ensure that "Default (fast-forward or merge) is selected and click "Next"
      10. Ensure that "Git Credential Manager" is selected and click on "Next".
      11. Ensure that "Enable file system caching" is selected and click on "Next".
      12. Click on "Install".
      13. Click on "Finish" or "Next".
    3. If your "HOME" environment variable is not set (or you don't know what this is):
      1. Open command prompt (Open Start Menu then type cmd and press Enter)
      2. Type the following line into the command prompt window exactly as shown:

        setx HOME "%USERPROFILE%"

      3. Press Enter, you should see SUCCESS: Specified value was saved.
      4. Quit command prompt by typing exit then pressing Enter

    This will provide you with both Git and Bash in the Git Bash program.

    Video Tutorial

    The default shell in Mac OS X Ventura and newer versions is Zsh, but Bash is available in all versions, so no need to install anything. You can access Bash from the Terminal (found in /Applications/Utilities). See the Git installation video tutorial for an example on how to open the Terminal. You may want to keep Terminal in your dock for this workshop.

    To see if your default shell is Bash type echo $SHELL in Terminal and press the Return key. If the message printed does not end with '/bash' then your default is something else, you can change your current shell to Bash by typing bash and then pressing Return. To check your current shell type echo $0 and press Return.

    To change your default shell to Bash type chsh -s /bin/bash and press the Return key, then reboot for the change to take effect. To change your default back to Zsh, type chsh -s /bin/zsh, press the Return key and reboot. To check available shells, type cat /etc/shells.

    The default shell is usually Bash and there is usually no need to install anything.

    To see if your default shell is Bash type echo $SHELL in Terminal and press the Return key. If the message printed does not end with '/bash' then your default is something else, you can change your current shell to Bash by typing bash and then pressing Return. To check your current shell type echo $0 and press Return.

    To change your default shell to Bash type chsh -s /bin/bash and press the Return key, then reboot for the change to take effect. To change your default back to Zsh, type chsh -s /bin/zsh, press the Return key and reboot. To check available shells, type cat /etc/shells.

    Git

    Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.com. You will need a supported web browser.

    You will need a Github account from github.com for parts of the Git lesson, specifically episodes 7 & 8.

    • 1. Go to github.com and follow the "Sign up" link at the top-right of the window.
    • 2. Follow the instructions to create an account.
    • 3. Verify your email address on Github
    • 4. Configure multifactor authentication

    Please open the Terminal app, type git --version and press Enter/Return. If it's not installed already, follow the instructions to Install the "command line developer tools". Do not click "Get Xcode", because that will take too long and is not necessary for our Git lesson. After installing these tools, there won't be anything in your /Applications folder, as they and Git are command line programs. For older versions of OS X (10.5-10.8) use the most recent available installer labelled "snow-leopard" available here. (Note: this project is no longer maintained.) Because this installer is not signed by the developer, you may have to right click (control click) on the .pkg file, click Open, and click Open in the pop-up dialog.

    If Git is not already available on your machine you can try to install it via your distro's package manager. For Debian/Ubuntu run sudo apt-get install git and for Fedora run sudo dnf install git.

    Preparing Your Working Directory

    We'll do our work in the Desktop folder so make sure you change your working directory to it with:
           $ cd
           $ cd Desktop
      

    Text Editor

    When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words. The default text editor on macOS and Linux is usually set to Vim, which is not famous for being intuitive. If you accidentally find yourself stuck in it, hit the Esc key, followed by :+q+! (colon, lower-case 'q', exclamation mark), then hitting Return to return to the shell.

    nano is a basic editor and the default that instructors use in the workshop. It is installed along with Git.

    nano is a basic editor and the default that instructors use in the workshop. See the Git installation video tutorial for an example on how to open nano. It should be pre-installed.

    nano is a basic editor and the default that instructors use in the workshop. It should be pre-installed.

    R

    R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R in our lessons, we typically use RStudio. R and Rstudio are two separate pieces of software

    Install R by:

    • Navigating to CRAN and following the instructions outlined there, using your package manager. We have reproduced the commands below:
      Use the terminal command prompt to type/copy-and-paste these commands in, pressing Enter after each line to run the command.
      Do not run the lines with # at the start of each line, as this indicates a comment and is not part of the command.
          # update indices
          sudo apt update -qq
      
          # install two helper packages we need
          sudo apt install --no-install-recommends software-properties-common dirmngr
      
          # add the signing key for these repos
          wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
      
          # add the repo from CRAN
          sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
      
          # install R itself
          sudo apt install --no-install-recommends r-base
                      
    • Installing the RStudio Server IDE:
          # install dependencies
          sudo apt install -y r-base-core r-recommended r-base-dev gdebi-core build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev
          sudo apt install --no-install-recommends gdebi-core
      
          # cd ~/Downloads
      
          # download the latest RStudio Server .deb file
          wget https://download2.rstudio.org/server/jammy/amd64/rstudio-server-2025.05.1-513-amd64.deb
      
          # install the .deb file
          sudo gdebi rstudio-server-2025.05.1-513-amd64.deb
      
          # start the RStudio Server
          sudo systemctl start rstudio-server
      
          # enable RStudio Server to start on boot
          sudo systemctl enable rstudio-server
                      
    • After installation of RStudio Server, check you can access it by:
      • Opening a web browser and navigating to http://localhost:8787.
      • Logging in with the username and password you used when you set up Linux / WSL2.

    If you are using Windows and WSL2, the full in-depth instructions for installing R on WSL2 can be found in this POSIT article.

    Python

    Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we will be using Google Colab for the purposes of this workshop. Google Colab is a free online cloud-based Jupyter Notebook environment that allows us to run your Python code without installing anything on your local machine. The current Python version in Colab is 3.12.

    We will teach Python using the Jupyter Notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported

    Starting a Google Colab instance

    Go to Google Colab. You will have to log into your Google Account. Click on "New Notebook". Once the login is successful, it should launch a Colab instance.

    In order to access your datasets in Colab, you can first upload the "Gapminder_dataset" folder to your Google Drive. You can then mount the Google Drive on Colab by clicking on the "Files" icon on the left, followed by clicking on "Mount Drive" icon.

    Once you have uploaded your data folder to your google drive and mounted it in Colab, you can go through your directories to get to your "Your data" folder. You can get to this path for this folder by right clicking on the "Your data" folder --> copy path and set the data_root_dir variable to that path.

    You can use the following code in Colab to make sure your path to your data is correct:
    data_root_dir ="/content/drive/MyDrive/-your data folder-/-Name of your dataset-/"

    Install Python packages

    • pandas data analysis and manipulation
    • numpy scientific computing
    • math mathematical functions and constants
    • matplotlib creating visualizations in python

    Starting a Google Colab instance

    Go to Google Colab. You will have to log into your Google Account. Click on "New Notebook". Once the login is successful, it should launch a Colab instance.

    Staring a Google Colab instance

    Go to Google Colab. You will have to log into your Google Account. Click on "New Notebook". Once the login is successful, it should launch a Colab instance.