Category Archives: Uncategorized

Data Scientist Tutorials

Videos

Data Science Hands-on with Open Source Tools (Archived)
Data Science Tools
PyCon Cleveland 2018 – Python For Data Science (2018 Intro Course)

Text-Tutorials

Google’s Python Class (Python 2.7)
Python Practice Book (Python 2.7)
Python Exercises, Practice, Solution (Python 3)
Practice Python – Beginner Python exercises (Python 3)
Interaktives Python 3 Tutorial mit über 100 Übungen
CodingBat code practice – Python
Python School

PyCharm Edu
Learn X in Y minutes, Where X=python3

Real Python

Python Learning Paths
Machine Learning With Python
An Intro to Threading in Python
Speed Up Your Python Program With Concurrency

Docker in Action – Fitter, Happier, More Productive
Object-Oriented Programming (OOP) in Python 3

Improve Skills by Practice and Problem Solving

wiki.python.org/moin/ProblemSets
exercism.io – Python Track (exercism.io CLI)
Rosalind Bioinformatics

$ exercism download --track=python --exercise=hamming

Downloaded to
/home/andreas/exercism/python/hamming

Jupyter and IPython

Installation

Package: jupyter-notebook (5.2.2-1)
Package: jupyter-notebook (5.4.1-1)

Documentation

Jupyter/IPython Notebook Quick Start Guide
Running the Jupyter Notebook
The Jupyter Notebook – Opening notebooks
How IPython works (deprecated)
Making kernels for IPython (deprecated)

JetBrains PyCharm

Using IPython/Jupyter Notebook with PyCharm

Create Virtualenv Environment

Using jupyter notebooks with a virtual environment
remove kernel on jupyter notebook

$ python -m venv projectname
$ source projectname/bin/activate
(venv) $ pip install ipykernel
(venv) $ ipython kernel install --user --name=projectname

MathJax

Swiss Python Summit 2018

Python Summit Talk Recordings

Peter Hoffmann – 12 Factor Apps for Data-Science with Python

Books

Python for Data Analysis, 2nd Edition
IPython Interactive Computing and Visualization Cookbook, Second Edition
Jupyter for Data Science

Software as a service (SaaS)

  • Cloud Computing
  • Infrastructure as a Service (IaaS)
  • Platform as a Service (PaaS)
  • Desktop as a Service (DaaS)
  • Managed software as a Service (MSaaS)
  • Mobile backend as a Service (MBaaS)
  • Information Technology Management as a Service (ITMaaS)

Centralized hosting of business applications dates back to the 1960s. Starting in that decade, IBM and other mainframe providers conducted a service bureau business, often referred to as time-sharing or utility computing. Such services included offering computing power and database storage to banks and other large organizations from their worldwide data centers.

Snowflake, The Only Data Warehouse Built for the Cloud
Welcome to the Snowflake Documentation
Key Concepts & Architecture

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.

Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities.

The Jupyter Notebook, Formerly known as the IPython Notebook
The Jupyter Notebook
github.com/jupyter/jupyter/wiki/Jupyter-kernels, Jupyter kernels
How do I install different languages in Jupyter Notebook?

Data Scientist / Services for Machine Learning

Jupyter

web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

pandas

Apache Parquet

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

Apache Arrow

A cross-language development platform for in-memory data.
Reading and Writing the Apache Parquet Format

Amit Kumar – Let’s Talk About GIL!

SymPy

SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

Global Interpreter Lock (GIL)