Are you currently using Python or R to manage, clean, and/or analyze your data? Would you craft a narrative of your research process that includes a mixture of text, interactive code, and dynamic visualizaitons?
If so, Python and R are both excellent fits. Each programming language has a distinct ecosystem of data science tools that integrates well with data visualization. Furthermore, the new trend to share code and research as a narrative (such as in a "Notebook") relies upon these programming languages.
Choosing a language: If you already use Python or R, consider sticking to the language and ecosystem of packages you already use. If you're new to programming, keep in mind that each language has a very similar set of capacities. R has historically been used more in statistical and quantitative analyses, and Python is a general-purpose programming language used in everything from text analysis to astronomy. Consider what languages and packages folks in your discipline tend to use, but there's really not a wrong choice!
Python is a general-purpose programming language that is used widely in the social sciences, physical sciences, digital humanities, etc.
To add data visualization functionality to your code, you must download a Python visualization package (e.g. using pip or an environment manager like Anaconda) and import the package into your script/program. (Read more: installing Python packages with pip; installing an Anaconda distribution)
List of widely used Python visualization packages:
R is a programming language with strong use in statistical analysis and the sciences. R has been used as an open source alternative to proprietary statistical analysis packages like SPSS, SASS, and MATLAB. Over time, R has developed a broader and more robust set of features in data science and computational analysis broadly, helped in particular through the tidyverse ecosystem of data science packages.
As with Python, you will need to augment R with additional packages to add data visualization support. Most users interact with R through an Integrated Development Environment (IDE) such as R or RStudio -- additional packages can be installed via the install features in each environment. Read more: R packages: a beginner's guide
List of widely used R data visualization libraries:
To try out Shiny, here is a walkthrough for three example applications.
Figure 1: Python code in a Jupyter Notebook with the resulting map visualization using the datashader library. View the source Notebook.
Figure 2: Introduction to Machine Learning via flower classification, using seaborn & plotly. View the source Notebook.
As you write code to generate visualizations, you may also wish to include more interactivity, transparency, and user control in your process. One way to accomplish this is to compose and share your work as a Jupyter Notebook.
A Jupyter Notebook is a single file that may include code, narrative/explanatory text (formatted as Markdown), and the outputs of running code. Users can share their notebooks as a .html page, a .pdf file, or an interactive notebook that can be run and manipulated by users on-the-fly.
(If you'd like to learn more about using Notebooks for effective storytelling, read: "The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool")
With a Jupyter Notebook, you can write Python code that uses packages like pandas and seaborn to generate visualizations. But unlike code running on your computer from a .py file, these visualizations will generate within the notebook itself. You can also run code written in other languages, including R!
For Cornell students, researchers, and staff: You have several options of how to run Jupyter Notebooks. If you wish to run it entirely off of the cloud (i.e. without downloading files to your computer, and shared with others via URLs), we have free access to Microsoft Azure's Notebook service. This is an excellent option for sharing work with others, using code and visualization in classes and workshops, etc. Alternatively, you can run Jupyter Notebook locally on your machine - for instance, it comes pre-installed with the Anaconda distribution manager. More local install information here.