Get Started with DataFrames.jl in Julia

Learn simple ways to get started with DataFrames.jl.
Published

April 23, 2023

In this tutorial, we will explore DataFrames.jl, a powerful library for working with tabular data in the Julia programming language. Whether you’re a beginner in data science or an experienced practitioner, DataFrames.jl can help you manipulate, analyze, and visualize your data in a more efficient and expressive manner.

What is DataFrames.jl?

DataFrames.jl is a Julia package for working with two-dimensional, tabular data structures called DataFrames. It offers a wide range of functions for data manipulation, cleaning, and transformation, making it a powerful tool for data analysis.

Installing DataFrames.jl

First, let’s install DataFrames.jl by running the following command in the Julia REPL (Read-Eval-Print Loop):

using Pkg
Pkg.add("DataFrames")

Once the installation is complete, you can load the package:

using DataFrames

Creating a DataFrame

Now, let’s create a simple DataFrame. There are several ways to create a DataFrame, but we’ll use the DataFrame() constructor for this tutorial.

df = DataFrame(; Name = ["Alice", "Bob", "Charlie"], Age = [25, 30, 35], Salary = [50000, 55000, 60000])
3×3 DataFrame
Row Name Age Salary
String Int64 Int64
1 Alice 25 50000
2 Bob 30 55000
3 Charlie 35 60000

This creates a DataFrame with three columns: Name, Age, and Salary. The = operator is used to assign column names and data.

Basic DataFrame Operations

Now that we have a DataFrame, let’s explore some basic operations. Accessing DataFrame Columns

To access a specific column, use the . syntax:

df.Name
3-element Vector{String}:
 "Alice"
 "Bob"
 "Charlie"

Or, you can use the : symbol:

df[:, :Name]
3-element Vector{String}:
 "Alice"
 "Bob"
 "Charlie"

Adding a New Column

To add a new column, simply assign data to a new column name:

df.Gender = ["Female", "Male", "Male"]
3-element Vector{String}:
 "Female"
 "Male"
 "Male"

Selecting Rows

You can select specific rows by using the : syntax and providing the row indices:

df[1:2, :]
2×4 DataFrame
Row Name Age Salary Gender
String Int64 Int64 String
1 Alice 25 50000 Female
2 Bob 30 55000 Male

Filtering Rows

To filter rows based on specific conditions, use the filter() function:

filtered_df = filter(row -> row.Age > 30, df)
1×4 DataFrame
Row Name Age Salary Gender
String Int64 Int64 String
1 Charlie 35 60000 Male

Sorting Data

You can sort a DataFrame using the sort!() function:

sort!(df, :Age)
3×4 DataFrame
Row Name Age Salary Gender
String Int64 Int64 String
1 Alice 25 50000 Female
2 Bob 30 55000 Male
3 Charlie 35 60000 Male

Summary

In this tutorial, we covered the basics of DataFrames.jl, from installation to creating and manipulating DataFrames. DataFrames.jl is a powerful package for working with tabular data, providing a wide range of functions to help you effectively analyze and visualize your data. We encourage you to explore its capabilities further and integrate it into your data science projects.

Reuse

Citation

For attribution, please cite this work as:
“Get Started with DataFrames.jl in Julia,” Apr. 23, 2023. https://juliacheat.codes/how-to/get-started-with-dataframes.