Link Search Menu Expand Document

Resources

Here, I have a list of resources for different kinds of things that I learned in grad school including things like how to set up Jupyter on cloud, how to download NGS data efficiently etc. I still come back to these for a refresher sometimes. If you are just getting started in the field of bioinformatics and data science, some of these may be helpful.



Linux

New to linux? Start with this simple tutorial that goes over the basics in short amount of time. Ryan also has tutorials on other topics such as HTML, CSS etc. Here are two cheatsheets that I sometimes refer to - Cheatsheet 1 and Cheatsheet 2.

Git

There are a lot of resources for Git out there and I don’t have any favorites. My suggestion would be to pick one and get started. Start with the basics and learn more as you need to instead of trying to learn it all at once. Here is a simple guide to get started.

Regular Expressions

Here is a quick youtube video to get started with regular expressions.

Python

Having the right directory structure for your code is key. Understanding how import statements work (the difference between relative and absolute imports etc.) in Python can make structuring the repo a lot easier and more maintainable. This page explains how import statements work. A highly recommended read. If you like reading books, check out this short book on some of the cool Python features.

Jupyter on Google Cloud Platform

Jupyter notebooks are amazing for exploratory data analysis and much more. For bigger datasets and high compute tasks you may want to run the notebook on a cloud server. This post will get you started if you use Google Cloud. If you want GPU-enabled notebooks on AWS, have a look at this.

Downloading bulk NGS Data efficiently

Here are two posts that really helped me when I wanted to download hundreds of immunosequencing datasets for one of my PhD projects (Post 1 and Post 2). The download speeds were much faster than using the SRA toolkit.