Data Science in Python

Let’s learn one basic element ‘2D Array’ of Python which plays an important role in Data Science. And one key task of Data Science is data cleaning and fixing in a 2D array data source which will take about 70%-80% of Data Science time.

Array Index: word == word[2:] + word[:2]

+—+—+—+—+—+—+
  | P | y | t | h | o | n |
  +—+—+—+—+—+—+
  0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

Library heavily used in Data Science.

Numpy library ‘import numpy as np’ is a few extremely important libraries for data science in Python.  It is great for efficiently loading, storing and manipulating in-memory data.  ‘Slicing array – a[start:stop:step]’ is often used in data cleaning.

image

image

image

Other aggregation functions
The table below lists other aggregation functions in NumPy. Most NumPy aggregates have a ‘NaN-safe’ version, which computes the result while ignoring missing values marked by the NaN value.

image

Pandas library ‘import pandas as pd’ in Python really does a lot to make working with data–and importing, cleaning, and organizing it–so much easier that it is hard to imagine doing data science in Python without it. One element in Pandas is DataFrame which is 2D array and Data Sciences uses it to handle lots of data cleaning and manipulating. Indexers ‘loc and iloc’ is to indicate the index with name (implicit index) or index (explicit index)

image

image

DataFrame functions to manipulate data: DataFrame.info(), DataFrame.head(), DataFrame.tail(), DataFrame.isnull(), DataFrame.notnull(), DataFrame.dropna(), DataFrame.fillna(), DataFrame.corr()

Seaborn library ‘import seaborn as sns’ is a tool of distribution plot such as sns.distplot(), sns.jointplot(), sns.pairplot()

Leave a comment