Polars, A Fast Data Frame library in Rust with a Python API:

Table of contents

No heading

No headings in the article.

Let's Begin !!!

Installing Polars

pip install polars

Importing Polars

import polars as pl

Creating DataFrames

# From a list
data = [1, 2, 3, 4, 5]
df = pl.DataFrame({"column_name": data})

# From a dictionary
data = {"Name": ["John", "Jane", "Mike"], "Age": [25, 30, 35]}
df = pl.DataFrame(data)

# From a CSV file
df = pl.read_csv("file.csv")

Basic Operations

# Display the DataFrame
df

# Display the first few rows of the DataFrame
df.head(n=5)

# Get information about the DataFrame
df.describe()

# Accessing a column by name
df["column_name"]

# Accessing multiple columns
df[["column1", "column2"]]

# Filtering rows based on a condition
df.filter(pl.col("column") > 5)

# Applying a function to a column
df = df.with_column(pl.col("column").apply(function))

Data Manipulation

# Renaming columns
df = df.with_column(pl.col("old_name").alias("new_name"))

# Dropping columns
df = df.drop(["column1", "column2"])

# Sorting values
df = df.sort("column", reverse=False)

# Dropping duplicate rows
df = df.drop_duplicates()

# Grouping by a column and aggregating
df = df.groupby("column").agg({"column1": "mean", "column2": "sum"})

# Merging DataFrames
df_merged = df1.join(df2, on="column", how="inner")

Data Cleaning

# Checking for missing values
df.is_null()

# Handling missing values
df = df.dropna()  # Drop rows with missing values
df = df.fill_null(value)  # Fill missing values with a specific value

# Replacing values
df = df.replace(old_value, new_value)

# Changing data types
df = df.with_column(pl.col("column").cast("new_type"))

Working with Dates

# Converting a column to datetime
df = df.with_column(pl.col("date_column").to_datetime(format="%Y-%m-%d"))

# Extracting components from a datetime column
df = df.with_column(pl.col("date_column").year(), pl.col("date_column").month(), pl.col("date_column").day())

# Resampling time series data
df = df.resample("D").sum()  # Resample to daily frequency and sum values

Visualization

# Plotting a line chart
df.plot(x="x_column", y="y_column", kind="line")

# Plotting a bar chart
df.plot(x="x_column", y="y_column", kind="bar")

# Plotting a scatter plot
df.plot(x="x_column", y="y_column", kind="scatter")

# Plotting a histogram
df.plot.hist(column="column")

Hope you found the above quick summary of Polars useful! This cheat sheet covers some commonly used functionalities in Polars, but there's a lot more to explore. If you want to dive deeper or need more information, I recommend referring to the official Polars documentation for comprehensive guidance:

Polars Documentation: https://docs.rs/polars/latest/polars/

The documentation provides detailed explanations, examples, and additional features available in Polars. It's a great resource to further enhance your understanding and proficiency with Polars. Happy coding!