Image by Author
Julia is a high-level, general-purpose language that is designed for high-performance calculation. It is getting popular among the data community and researchers due to natural language syntax, faster code execution, and a strong machine learning ecosystem.
Due to the popularity of the integrated notebooks, data scientists and researchers are now running Python, R, Bash, Scala, Ruby, and SQL on the Jupyter Notebook. And now, we will learn to install the Julia and set it up for the Jupyter notebook. Furthermore, we will load a CSV file and perform time series data visualization.
Julia can be used by running code in a REPL or executing the `.jl` file, but running the code in a Jupyter notebook gives us more control over experimentation. You can perform data analysis, train machine learning models, or even create a Julia package using the notebook.
Step 1: Download and Install the package
You can download and install the current stable release of Julia by visiting the official website. The stable release is available for Windows, Linux, and macOS.
It took me a few minutes to download and install Julia for Windows. To run Julia REPL, you type “julia” in PowerShell, Terminal, or Bash. You can also find the Julia icon at the start and click on it to start the REPL.
Step 2: Install IJulia
To integrate Julia with Jupyter Notebook, you need to install the Ijulia package.
In the Julia REPL, type:
using Pkg Pkg.add("IJulia")
Image by Author | Julia REPL
You can also install the Julia package by typing “]” to enter in the package menu. After that type `add Ijulia` to install the package.
Image by Author | Installation Ijulia
Step 3: Running the Julia in Jupyter Notebook
We are now ready to use Jupyter Notebook. Launch the Jupyter notebook, click on the New button and select the Juliet environment.
Image by Author | Jupiter Notebook
For VSCode, create a new Jupyter Notebook file and change the Kernel from Python to Julia by clicking on the Kernel name as shown below.
We now have R, Python, and Julia environments. You can switch between them based on your requirements.
Image by Author | VScode Jupyter Notebook
After installing Julia, let’s write a simple code to print the text. Just like Python, it executed the command smoothly.
Image by Author | Code execution on Jupyter Notebook
print("Visit KDnuggets.com for more cheat sheets and additional learning resources.") >>> Visit KDnuggets.com for more cheat sheets and additional learning resources.
You can install any Julia package within the Juypter cell by typing `using Pkg` and `Pkg.add(
We will be installing DataFrame, CSV, Plots, PyPlot, and RollingFunctions.
using Pkg Pkg.add("DataFrames") Pkg.add("CSV") Pkg.add("Plots") Pkg.add("PyPlot") Pkg.add("RollingFunctions")
Read CSV file
To access the package, you need to type `using` and then type all the package names separated by comma “,”.
Next, we are going to download US covid tracking data and save the CSV file as “covid_us.csv”.
Then, we will use `CSV.read` to read the CSV file and convert it into DataFrame. We will select only two columns “date” and “totalTestResultsIncrease”, and change the date format.
In the end, we will:
- Filter the results to remove negative values
- Sort the data frame in ascending order
- Display the last 5 rows.
using Downloads, DataFrames, CSV, Plots, Dates download_covid = Downloads.download("https://api.covidtracking.com/v1/us/daily.csv", "covid_us.csv") columns = [:date, :totalTestResultsIncrease] fmt = "yyyymmdd" t = Dict(:date=>Date) covid_df = CSV.read("covid_us.csv", DataFrame, dateformat=fmt, select=columns, types=t) covid_df = sort(filter(row -> row.totalTestResultsIncrease > 0, covid_df)) last(covid_df,5)
Data Visualization with Plot and Rolling Functions
I have modified Jonathan Dinu’s code to display the USA total testing capacity bar chart.
We will be using Plot.jl to display sticks/bar charts and RollingFunctions.jl to get a 7 day average of total test results.
using RollingFunctions # plot daily test increase as sticks Plots.plot(covid_df.date, covid_df.totalTestResultsIncrease, seriestype=:sticks, label="Test Increase", title = "USA Total Testing Capacity", lw = 2) # 7-day average using rolling mean window = 7 average = rollmean(covid_df.totalTestResultsIncrease, window) # we mutate the existing plot Plots.plot!(covid_df.date, cat(zeros(window - 1), average, dims=1), label="7-day Average", lw=3)
This is awesome.
You can easily find alternative Python and R data analytics packages in Julia by visiting Julia Packages webpage.
Julia is easy to use, and the code execution is faster than Python. If you are transitioning from R and Matlab to Julia, the syntax and package ecosystem will feel natural for you to adopt.
It is a general-purpose language, and recently it has started attracting the machine-learning community due to native packages that are totally built on Julia to provide faster training and inference time.
If you have any questions regarding Julia, do ask me in the comments. You can also join the Julia community on Slack, Discord, and Discourse to learn more about the latest developments.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a Bachelor’s degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.