Welcome to pyDataset¶
Renero
Dataset is for educational purposes, mainly. It tries to help those approaching Data Science in Python for the first time, who must deal with common (and time consuming) data preparation tasks.
This package tries, through a very simple approach, to collect all the common tasks that are normally done over pandas DataFrames, like:
load data
set the target variable
describe the health status of the dataset
drop/keep columns or sample from simple lists
split the dataset
count categorical and numerical features
fix NA’s
find correlations
detect skewness
scale numeric values
detect outliers
one hot encoding
find under represented categorical features
perform stepwise feature selection
compute information gain,
plot some useful charts
Install¶
To install this package, simply pip from this git repo:
$ pip install git+https://github.com/renero/dataset
Data Tutorial / Guide¶
A Comprehensive tutorial with examples:
The API Documentation¶
If you are looking for information on a specific function, or method, this part of the documentation is for you.
pyDataset API documentation