Instructors:
Ryan Horne (UCLA), Jose Niño Muriel (UCSB), Leigh Phan (UCLA), Jamie Jamison (UCLA), Derek Devnich (UCM), Reid Otsuji (UCSD), Cody Hennesy (UCB), Geoff Boushey (UCSF), Kaija Gahm (UCLA), Tim Dennis (UCLA), Julien Brun (UCSB), Jade Li (UCSB)
Helpers:
Kristi Liu (UCSB), Hannah Sutherland (UCLA), Leigh Phan (UCLA), Misha Coleman (UCB), Geno Sanchez (UCLA), Reid Otsuji (UCSD), Alessandra Vidal Meza (UCSB), Kristian Allen (UCLA), Hind Al Ali (UCSB), Celeste Allaband (UCSD)
Register
To attend,please register here. You will receive a Zoom link for each session selected.
General Information
The Carpentries project comprises the Software Carpentry, Data Carpentry, and
Library Carpentry communities of Instructors, Trainers, Maintainers,
helpers, and supporters who share a mission to teach foundational computational and data science
skills to researchers.
Want to learn more and stay engaged with The Carpentries? Carpentries Clippings is The Carpentries' biweekly newsletter, where we share community news, community job postings, and more.
Sign up to receive future editions and read our full archive: https://carpentries.org/newsletter/
Software Carpentry
aims to help researchers get their work done
in less time and with less pain
by teaching them basic research computing skills.
This hands-on workshop will cover basic concepts and tools,
including program design, version control, data management,
and task automation.
Participants will be encouraged to help one another
and to apply what they have learned to their own research problems.
Who:
The course is aimed at graduate students and other researchers.
You don't need to have any previous knowledge of the tools
that will be presented at the workshop.
Requirements:
Participants must bring a laptop with a
Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.
They should have a few specific software packages installed (listed below).
Accessibility:
We are committed to making this workshop
accessible to everybody.
The workshop organizers have checked that:
The room is wheelchair / scooter accessible.
Accessible restrooms are available.
We are dedicated to providing a positive and accessible learning environment for all.
We do not require participants to provide documentation of disabilities or disclose any unnecessary personal information.
However, we do want to help create an inclusive, accessible experience for all participants.
We encourage you to share any information that would be helpful to make your Carpentries experience accessible.
To request an accommodation for this workshop, please fill out the
accommodation request form.
If you have questions or need assistance with the accommodation form please email us.
Glosario is a multilingual glossary
for computing and data science terms. The glossary helps
learners attend workshops and use our lessons to make sense of computational and programming jargon written in English by offering it
in their native language. Translating data science terms also provides a teaching tool for Carpentries Instructors to reduce barriers
for their learners.
Workshop Recordings:
Carpentries workshops are designed to be interactive rather than lecture-based, with lessons that build upon one another.
To foster a positive online learning environment, we strongly recommend that participants join in real time.
As a result, workshop recordings are not recommended and may not be available to learners.
Roles:
To learn more about the roles at the workshop (who will be doing what),
refer to our Workshop FAQ.
Code of Conduct
Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.
Collaborative Notes
We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.
Surveys
Please be sure to complete these surveys before and after the workshop.
Schedule for Day 1: Tidy Data on Mon, Sep 15, 2025
8:30 AM - 10:00 AM
Using Spreadsheet Programs for Data Organization Formatting Data Tables in Spreadsheets
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Formatting Problems Dates as Data
11:30 AM - 12:20 PM
Basic Quality Assurance and Control Exporting Data from Spreadsheets Caveats of Popular Data and File Formats
Day 2 (OpenRefine) —
Schedule for Day 2: OpenRefine on Tue, Sep 16, 2025
8:30 AM - 10:00 AM
Introduction to OpenRefine Working with OpenRefine Projects
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Faceting and Filtering Data Clustering and Transformations
11:30 AM - 12:20 PM
Exporting and Saving Data Advanced Data Operations
Day 3 (Unix/Command Line) —
Schedule for Day 3: Unix on Wed, Sep 17, 2025
8:30 AM - 10:00 AM
Introducing the Shell Navigating Files and Directories
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Working With Files and Directories Pipes and Filters
11:30 AM - 12:20 PM
Loops Shell Scripts Finding Things
Day 4 (Git) —
Schedule for Day 4: Git on Thu, Sep 18, 2025
8:30 AM - 10:00 AM
Navigating Git Repositories Recording Changes to Files
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Viewing the History of Changes Undoing Changes
11:30 AM - 12:20 PM
Working with Remotes Collaborating with Others
Day 5 (SQL) —
Schedule for Day 5: SQL on Fri, Sep 19, 2025
8:30 AM - 10:00 AM
Using Databases and SQL Selecting Data
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Filtering Data Calculating New Values
11:30 AM - 12:20 PM
Aggregating Data Combining Data
Day 6 (Python: Part 1) —
Schedule for Day 6: Python: Part 1 on Mon, Sep 22, 2025
8:30 AM - 10:00 AM
Running and Quitting Variables and Assignment
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Data Types and Type Conversion Built-in Functions and Help
11:30 AM - 12:20 PM
Conditionals Loops
Day 7 (Python: Part 2) —
Schedule for Day 7: Python: Part 2 on Tue, Sep 23, 2025
8:30 AM - 10:00 AM
Functions Libraries
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Reading Tabular Data into DataFrames Pandas DataFrames
11:30 AM - 12:20 PM
Plotting
Day 8 (R: Part 1) —
Schedule for Day 8: R: Part 1 on Wed, Sep 24, 2025
8:30 AM - 10:00 AM
Introduction to R and RStudio Project Management With RStudio
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Seeking Help Data Structures
11:30 AM - 12:20 PM
Exploring Data Frames Subsetting Data
Day 9 (R: Part 2) —
Schedule for Day 9: R: Part 2 on Thu, Sep 25, 2025
8:30 AM - 10:00 AM
Vectorization Dataframe Manipulation with dplyr
10:00 AM - 10:10 AM
Break
10:10 AM - 11:30 AM
Creating Publication-Quality Graphics with ggplot2 Producing Reports With knitr
11:30 AM - 12:20 PM
Setup
To participate in a
Software Carpentry
workshop,
you will need access to software as described below.
In addition, you will need an up-to-date web browser.
To interact with spreadsheets, we can use LibreOffice,
Microsoft Excel,
Gnumeric,
OpenOffice.org,
or other programs.
Commands may differ a bit between programs, but general ideas for thinking about spreadsheets is the same.
For this lesson, if you don't have a spreadsheet program already,
we will be using Excel, but you can use Google Sheets as well.
Download the Installer:
Install LibreOffice by going to the
installation page.
The version for Windows should automatically be selected.
Click Download.
You will go to a page that asks about a donation, but you don't need to make one.
Your download should begin automatically.
Install LibreOffice:
Once the installer is downloaded, double click on it and it should install.
Download the Installer:
Install LibreOffice by going to the
installation page.
The version for Mac OS should automatically be selected.
Click Download.
You will go to a page that asks about a donation, but you don't need to make one.
Your download should begin automatically.
Install LibreOffice:
The file LibreOffice\_X.X.X\_MacOS\_x86-64 (whichever version of LibreOffice you have selected) should have been downloaded.
Double click on this file, and LibreOffice will be installed.
Download the Installer:
Install LibreOffice by going to the
installation page.
The version for Linux should automatically be selected.
Click Download.
You will go to a page that asks about a donation, but you don't need to make one.
Your download should begin automatically.
Install LibreOffice:
Once the installer is downloaded, double click on it and it should install.
OpenRefine
For this lesson you will need OpenRefine and a
web browser. Note: this is a Java program that runs on your machine (not in the cloud).
It runs inside a web browser, but no web connection is needed.
Check that you have either the Firefox or the Chrome browser installed and set as your default browser.
OpenRefine runs in your default browser.
It will not run correctly in Internet Explorer.
Unzip the downloaded file into the OpenRefine directory by right-clicking and selecting "Extract ...".
Go to your newly created OpenRefine directory.
Launch OpenRefine by clicking openrefine.exe (this will launch a command prompt window, but you can ignore that - just wait for OpenRefine to open in the browser).
If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It may not run correctly in Safari.
Unzip the downloaded file into the OpenRefine directory by double-clicking it.
Go to your newly created OpenRefine directory.
Launch OpenRefine by dragging the icon into the Applications folder.
Use Ctrl-click/Open ... to launch it.
If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser.
Unzip the downloaded file into the OpenRefine directory.
Go to your newly created OpenRefine directory.
Launch OpenRefine by entering ./refine into the terminal within the OpenRefine directory.
If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
SQLite
SQL is a specialized programming language used with databases. You have two options for SQL: you can
use a database manager called SQLite or install DB Browser for SQLite.
Copy the following curl -fsSL https://jmjamison.github.io/2025-09-15-UC/getsql.sh | bash
Paste it into the window that Git Bash opened. If you're unsure, ask an instructor for help
You should see something like 3.50.2 2025-06-28...
If you want to do this manually, download sqlite3, make a bin directory in the user's home directory, unzip sqlite3, move it into the bin directory, and then add the bin directory to the path.
If you have a previous Anaconda Distribution sqlite is preinstalled. You can check by:
open an Anaconda Prompt and typing sqlite3
you should see SQLite version 3.50.2 2025-06-28 14:00:48
DB Browser for SQLite is a visual tool to create, edit, and query
SQLite databases.SQLite is included with DB Browser for SQLite, so it does not have to be
installed separately.
There are a few options for Windows, but most modern computers can use the Standard installer for 64-bit Windows version.
The .zip (no installer) can be run directly from the folder, after extracting the contents of the zip file.
It will not show up in the Start menu.
There are also two options for MacOS. Most people should use the DB Browser for SQLite (Universal) installer.
Launch DB Browser for SQLite to confirm that the installation was successful.
The Bash Shell
Bash is a commonly-used shell that gives you the power to do
tasks more quickly.
Click on "Next" four times (two times if you've previously
installed Git). You don't need to change anything
in the Information, location, components, and start menu screens.
From the dropdown menu, "Choosing the default editor used by Git", select "Use the Nano editor by default" (NOTE: you will need to scroll up to find it) and click on "Next".
On the page that says "Adjusting the name of the initial branch in new repositories", ensure that
"Let Git decide" is selected. This will ensure the highest level of compatibility for our lessons.
Ensure that "Git from the command line and also from 3rd-party software" is selected and
click on "Next". (If you don't do this Git Bash will not work properly, requiring you to
remove the Git Bash installation, re-run the installer and to select the "Git from the
command line and also from 3rd-party software" option.)
Select "Use bundled OpenSSH".
Ensure that "Use the native Windows Secure Channel Library" is selected and click on "Next".
Ensure that "Checkout Windows-style, commit Unix-style line endings" is selected and click on "Next".
Ensure that "Use Windows' default console window" is selected and click on "Next".
Ensure that "Default (fast-forward or merge) is selected and click "Next"
Ensure that "Git Credential Manager" is selected and click on "Next".
Ensure that "Enable file system caching" is selected and click on "Next".
Click on "Install".
Click on "Finish" or "Next".
If your "HOME" environment variable is not set (or you don't know what this is):
Open command prompt (Open Start Menu then type cmd and press Enter)
Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
Press Enter, you should see SUCCESS: Specified value was saved.
Quit command prompt by typing exit then pressing Enter
This will provide you with both Git and Bash in the Git Bash program.
Video Tutorial
The default shell in Mac OS X Ventura and newer versions is Zsh, but
Bash is available in all versions, so no need to install anything.
You can access Bash from the Terminal (found in
/Applications/Utilities).
See the Git installation video tutorial
for an example on how to open the Terminal.
You may want to keep Terminal in your dock for this workshop.
To see if your default shell is Bash type echo $SHELL
in Terminal and press the Return key. If the message
printed does not end with '/bash' then your default is something
else, you can change your current shell to Bash by typing
bash and then pressing Return. To check
your current shell type echo $0 and press Return.
To change your default shell to Bash type chsh -s /bin/bash and
press the Return key, then reboot for the change to take effect. To
change your default back to Zsh, type chsh -s /bin/zsh, press the
Return key and reboot. To check available shells, type
cat /etc/shells.
The default shell is usually Bash and there is usually no need to
install anything.
To see if your default shell is Bash type echo $SHELL
in Terminal and press the Return key. If the message
printed does not end with '/bash' then your default is something
else, you can change your current shell to Bash by typing
bash and then pressing Return. To check
your current shell type echo $0 and press Return.
To change your default shell to Bash type chsh -s /bin/bash and
press the Return key, then reboot for the change to take effect. To
change your default back to Zsh, type chsh -s /bin/zsh, press the
Return key and reboot. To check available shells, type
cat /etc/shells.
Git
Git is a version control system that lets you track who made changes
to what when and has options for easily updating a shared or public
version of your code
on github.com. You will need a
supported
web browser.
You will need a Github account from github.com
for parts of the Git lesson, specifically episodes 7 & 8.
1. Go to github.com and follow the "Sign up" link at the top-right of the window.
Please open the Terminal app, type git --version and press
Enter/Return. If it's not installed already,
follow the instructions to Install the "command line
developer tools". Do not click "Get Xcode", because that will
take too long and is not necessary for our Git lesson.
After installing these tools, there won't be anything in your /Applications
folder, as they and Git are command line programs.
For older versions of OS X (10.5-10.8) use the
most recent available installer labelled "snow-leopard"
available here.
(Note: this project is no longer maintained.)
Because this installer is not signed by the developer, you may have to
right click (control click) on the .pkg file, click Open, and click
Open in the pop-up dialog.
If Git is not already available on your machine you can try to
install it via your distro's package manager. For Debian/Ubuntu run
sudo apt-get install git and for Fedora run
sudo dnf install git.
Preparing Your Working Directory
We'll do our work in the Desktop folder so make sure you change your working directory to it with:
$ cd
$ cd Desktop
Text Editor
When you're writing code, it's nice to have a text editor that is
optimized for writing code, with features like automatic
color-coding of key words. The default text editor on macOS and
Linux is usually set to Vim, which is not famous for being
intuitive. If you accidentally find yourself stuck in it, hit
the Esc key, followed by :+q+!
(colon, lower-case 'q', exclamation mark), then hitting Return to
return to the shell.
nano is a basic editor and the default that instructors use in the workshop.
It is installed along with Git.
nano is a basic editor and the default that instructors use in the workshop.
See the Git installation video tutorial
for an example on how to open nano.
It should be pre-installed.
nano is a basic editor and the default that instructors use in the workshop.
It should be pre-installed.
R
R is a programming language
that is especially powerful for data exploration, visualization, and
statistical analysis. To interact with R in our lessons, we typically use
RStudio.
R and Rstudio are two separate pieces of software
Navigating to CRAN and following the instructions outlined there, using your package manager. We have reproduced the commands below:
Use the terminal command prompt to type/copy-and-paste these commands in, pressing Enter after each line to run the command.
Do not run the lines with # at the start of each line, as this indicates a comment and is not part of the command.
# update indices
sudo apt update -qq
# install two helper packages we need
sudo apt install --no-install-recommends software-properties-common dirmngr
# add the signing key for these repos
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
# add the repo from CRAN
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
# install R itself
sudo apt install --no-install-recommends r-base
# install dependencies
sudo apt install -y r-base-core r-recommended r-base-dev gdebi-core build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev
sudo apt install --no-install-recommends gdebi-core
# cd ~/Downloads
# download the latest RStudio Server .deb file
wget https://download2.rstudio.org/server/jammy/amd64/rstudio-server-2025.05.1-513-amd64.deb
# install the .deb file
sudo gdebi rstudio-server-2025.05.1-513-amd64.deb
# start the RStudio Server
sudo systemctl start rstudio-server
# enable RStudio Server to start on boot
sudo systemctl enable rstudio-server
After installation of RStudio Server, check you can access it by:
Opening a web browser and navigating to http://localhost:8787.
Logging in with the username and password you used when you set up Linux / WSL2.
If you are using Windows and WSL2, the full in-depth instructions for installing R on WSL2 can be found in this
POSIT article.
Python
Python is a popular language for research computing, and great for general-purpose programming as well.
Installing all of its research packages individually can be a bit difficult, so we will be using Google Colab for the purposes of this workshop. Google Colab is a free online cloud-based
Jupyter Notebook environment that allows us to run your Python code without installing anything on your local machine. The current Python version in Colab is 3.12.
We will teach Python using the Jupyter Notebook,
a programming environment that runs in a web browser. For this to work you will need a reasonably
up-to-date browser. The current versions of the Chrome, Safari and
Firefox browsers are all
supported
Go to Google Colab. You will have to log into your Google Account.
Click on "New Notebook". Once the login is successful, it should launch a Colab instance.
In order to access your datasets in Colab, you can first upload the "Gapminder_dataset" folder to your Google Drive.
You can then mount the Google Drive on Colab by clicking on the "Files" icon on the left,
followed by clicking on "Mount Drive" icon.
Once you have uploaded your data folder to your google drive and mounted it in Colab, you can go through your directories to get to your
"Your data" folder. You can get to this path for this folder by right clicking on the "Your data" folder --> copy
path and set the data_root_dir variable to that path.
You can use the following code in Colab to make sure your path to your data is correct:
data_root_dir ="/content/drive/MyDrive/-your data folder-/-Name of your dataset-/"
Install Python packages
pandas data analysis and manipulation
numpy scientific computing
math mathematical functions and constants
matplotlib creating visualizations in python
Starting a Google Colab instance
Go to Google Colab. You will have to log into your Google Account.
Click on "New Notebook". Once the login is successful, it should launch a Colab instance.
Staring a Google Colab instance
Go to Google Colab. You will have to log into your Google Account.
Click on "New Notebook". Once the login is successful, it should launch a Colab instance.