Polars, A Fast Data Frame library in Rust with a Python API:
Table of contents
No headings in the article.
Let's Begin !!!
Installing Polars
pip install polars
Importing Polars
import polars as pl
Creating DataFrames
# From a list
data = [1, 2, 3, 4, 5]
df = pl.DataFrame({"column_name": data})
# From a dictionary
data = {"Name": ["John", "Jane", "Mike"], "Age": [25, 30, 35]}
df = pl.DataFrame(data)
# From a CSV file
df = pl.read_csv("file.csv")
Basic Operations
# Display the DataFrame
df
# Display the first few rows of the DataFrame
df.head(n=5)
# Get information about the DataFrame
df.describe()
# Accessing a column by name
df["column_name"]
# Accessing multiple columns
df[["column1", "column2"]]
# Filtering rows based on a condition
df.filter(pl.col("column") > 5)
# Applying a function to a column
df = df.with_column(pl.col("column").apply(function))
Data Manipulation
# Renaming columns
df = df.with_column(pl.col("old_name").alias("new_name"))
# Dropping columns
df = df.drop(["column1", "column2"])
# Sorting values
df = df.sort("column", reverse=False)
# Dropping duplicate rows
df = df.drop_duplicates()
# Grouping by a column and aggregating
df = df.groupby("column").agg({"column1": "mean", "column2": "sum"})
# Merging DataFrames
df_merged = df1.join(df2, on="column", how="inner")
Data Cleaning
# Checking for missing values
df.is_null()
# Handling missing values
df = df.dropna() # Drop rows with missing values
df = df.fill_null(value) # Fill missing values with a specific value
# Replacing values
df = df.replace(old_value, new_value)
# Changing data types
df = df.with_column(pl.col("column").cast("new_type"))
Working with Dates
# Converting a column to datetime
df = df.with_column(pl.col("date_column").to_datetime(format="%Y-%m-%d"))
# Extracting components from a datetime column
df = df.with_column(pl.col("date_column").year(), pl.col("date_column").month(), pl.col("date_column").day())
# Resampling time series data
df = df.resample("D").sum() # Resample to daily frequency and sum values
Visualization
# Plotting a line chart
df.plot(x="x_column", y="y_column", kind="line")
# Plotting a bar chart
df.plot(x="x_column", y="y_column", kind="bar")
# Plotting a scatter plot
df.plot(x="x_column", y="y_column", kind="scatter")
# Plotting a histogram
df.plot.hist(column="column")
Hope you found the above quick summary of Polars useful! This cheat sheet covers some commonly used functionalities in Polars, but there's a lot more to explore. If you want to dive deeper or need more information, I recommend referring to the official Polars documentation for comprehensive guidance:
Polars Documentation: https://docs.rs/polars/latest/polars/
The documentation provides detailed explanations, examples, and additional features available in Polars. It's a great resource to further enhance your understanding and proficiency with Polars. Happy coding!