Python Cheat Sheet

Download link

Advertisements

Power BI for For Beginners Part-1

During this series you can learn how to use power bi to connect the data analyze and model that data to bring the insights and track the data that changes over time how you can make better business decision.

Power bi have: Desktop Application which allows you to connect the data model it analyze and enrich it then visualize that data. There are mobile application and there are programmatic approach you can handle the data.

Will start with power bi desktop, connect to excel spreed sheet data pull data and build some visualization then how we can publish that to powerbi.com service and build some dashboard from that share and collaborate with organization. Continue reading

Facebook Data Analytics with Python Part-1

In this series of blog i will explain how you can use Graph API using python to make simple request and make some data analysis with it.

So what is Graph API?

It is a API developed by facebook so you can use it to gate data from facebook for example data from pages, groups or even on your personal pages. Facebook contain huge amount of valuable data that you want as a data analyst ,scientist or a researcher we will use python and panda for this series.

We will make some simple example on pages and groups and gets things like what are more interesting posts or comments and when hope you will enjoy. Continue reading

Linear Regression in Microsoft Azure Cloud End to End Flow

What is Regression?

In simple word it’s a relationship!! exactly what we have in facebook but here it is not between two human beings between two set of numbers.

Examples: Sales ~ price

what will happen to sales if price increase/decrease. If price goes up sales goes down kind of relationship most of us know. We know it from our past experience. But if i will tell you neurofibromin ~ helix-loop-helix(HLH) what will happen if i increase HLH what will going to happen with neurofibromin. (This is what most of us don’t know from our past experience).Regression will tell us what is the relation ship between sales ~ price or neurofibromin ~ helix-loop-helix(HLH) if one increases what happen to others that is the relationship what we can find out. Continue reading

Comparative Analysis On Azure Data Store

Azure Storage: 

Table: A NoSQL key-value store for rapid development using massive semi-structured dataset. Highly scalable to PBs, and has dynamic scaling based on load. Has fast key/value lookups. We can consider this one for alternative for relational DB which is highly scalable and schema less.

Queue: When applications absorb unexpected traffic bursts and can prevent servers from being overwhelmed by a sudden flood of requests.  Instead of getting dropped, incoming requests are buffered in the queue until servers catch up so traffic bursts don’t take down your applications.  Continue reading

Getting Started with Azure Data Lake for Analytics

What is Azure Data Lake:

It is a new flavor of Azure Blob Storage which can handle streaming data (low latency, high volume, short updates), data-locality aware and allows individual files to be sized at petabyte scale. It a basically is a HDFS as a service. We can store all type of data here (structure data like relational DB, unstructured like logs, and semi structure data like json and xml file). While storing data inside azure data lake no need to define the schema. It only support schema on read.

Flow:

Continue reading

Comparative difference between partitioning and bucketing in hive

This is one of the most common question i found peoples are getting confused. I have decided why not to write a simple explanation about this.

Usually Partitioning in hive offers a way of segregating hive table data into multiple files/directory’s. But partitioning gives effective results when,

  • There are limited number of partitions
  • Comparatively equal sized partitions

But this may not possible in all scenarios, like when are partitioning our tables based geographic locations like country, some bigger countries will have large partitions(ex: 4-5 countries itself contributing 70-80% of total data) where as small countries data will create small partitions (remaining all countries in the world may contribute to just 20-30% of total data).So, In these cases Partitioning will not be ideal. Continue reading

Step By Step Process to Create Crystal Layout for SAP Business One

In this example i will create one layout for Sales- A/R

1.Open SAP Crystal Reports for SAP Business One.

2.Go To File -> New -> Blank Report . It will ask for Data source to connect.

3.Create a New connection for SAP Business One.

4.Open the connection goto B1Tables -> Category and select the tables as shown below and say OK. Complete Steps 

Continue reading

Sqoop Import

 

Sqoop import examples Lesson1

________________________________________$ mysql -u rootmysql>

mysql> show databases;
+——————–+
| Database           |
+——————–+
| information_schema |
| datahub            |
| datamart           |
| halitics           |
| hivemetastore      |
| movielens          |
| mysql              |
| product            |
| siri               |
| training           |
+——————–+
10 rows in set (0.11 sec)

mysql>

mysql> create database practice;
Query OK, 1 row affected (0.02 sec)

mysql> use practice;
Database changed

mysql> show tables;
Empty set (0.00 sec)

mysql> create table samp(id int primary key,
->     name char(10), sal int,
->   sex char(1), dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> describe samp;
+——-+———-+——+—–+———+——-+
| Field | Type     | Null | Key | Default | Extra |
+——-+———-+——+—–+———+——-+
| id    | int(11)  | NO   | PRI | NULL    |       |
| name  | char(10) | YES  |     | NULL    |       |
| sal   | int(11)  | YES  |     | NULL    |       |
| sex   | char(1)  | YES  |     | NULL    |       |
| dno   | int(11)  | YES  |     | NULL    |       |
+——-+———-+——+—–+———+——-+
5 rows in set (0.00 sec)

mysql>

mysql>
mysql> insert into samp values(101,’aaaa’,10000,’m’,12)
-> ;
Query OK, 1 row affected (0.01 sec)

mysql> insert into samp values(102,’addaaa’,20000,’f’,12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(103,’ada’,20000,’f’,13);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(104,’ada’,50000,’f’,11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(105,’addda’,70000,’m’,12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(106,’adddda’,80000,’m’,11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(107,’xadddda’,70000,’f’,12);
Query OK, 1 row affected (0.00 sec)

mysql>

mysql> select * from samp;
+—–+———+——-+——+——+
| id  | name    | sal   | sex  | dno  |
+—–+———+——-+——+——+
| 101 | aaaa    | 10000 | m    |   12 |
| 102 | addaaa  | 20000 | f    |   12 |
| 103 | ada     | 20000 | f    |   13 |
| 104 | ada     | 50000 | f    |   11 |
| 105 | addda   | 70000 | m    |   12 |
| 106 | adddda  | 80000 | m    |   11 |
| 107 | xadddda | 70000 | f    |   12 |
+—–+———+——-+——+——+
7 rows in set (0.00 sec)

______________________

[training@localhost ~]$ sqoop import    –connect  jdbc:mysql://localhost/practice   –username root   –table  samp   –target-dir   sqimp1

[training@localhost ~]$ hadoop fs -ls sqimp1
Found 6 items
-rw-r–r–   1 training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_SUCCESS
drwxr-xr-x   – training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_logs
-rw-r–r–   1 training supergroup         42 2016-06-01 19:36 /user/training/sqimp1/part-m-00000
-rw-r–r–   1 training supergroup         38 2016-06-01 19:36 /user/training/sqimp1/part-m-00001
-rw-r–r–   1 training supergroup         21 2016-06-01 19:36 /user/training/sqimp1/part-m-00002
-rw-r–r–   1 training supergroup         45 2016-06-01 19:36 /user/training/sqimp1/part-m-00003
[training@localhost ~]$

[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00001
103,ada,20000,f,13
104,ada,50000,f,11
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00002
105,addda,70000,m,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00003
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

by default sqoop initiates 4 mappers.

these mappers will parallely importing data .

that means, table is splitted into 4 parts,
each part is taken by seperate mapper.

how to controll number of mappers.

use the option -m <number>

ex:   -m 2

[training@localhost ~]$ sqoop import \
>  –connect  jdbc:mysql://localhost/practice \
>  –username root  \
>  –table samp -m 2 –target-dir sqimp2

[training@localhost ~]$ hadoop fs -ls sqimp2
Found 4 items
-rw-r–r–   1 training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_SUCCESS
drwxr-xr-x   – training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_logs
-rw-r–r–   1 training supergroup         61 2016-06-01 19:48 /user/training/sqimp2/part-m-00000
-rw-r–r–   1 training supergroup         85 2016-06-01 19:48 /user/training/sqimp2/part-m-00001
[training@localhost ~]$ hadoop fs -cat sqimp2/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

____________________

mysql> create table damp(id int,
->   name char(10), sal int, sex char(1),
->   dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into damp select * from samp;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select * from damp;
+——+———+——-+——+——+
| id   | name    | sal   | sex  | dno  |
+——+———+——-+——+——+
|  101 | aaaa    | 10000 | m    |   12 |
|  102 | addaaa  | 20000 | f    |   12 |
|  103 | ada     | 20000 | f    |   13 |
|  104 | ada     | 50000 | f    |   11 |
|  105 | addda   | 70000 | m    |   12 |
|  106 | adddda  | 80000 | m    |   11 |
|  107 | xadddda | 70000 | f    |   12 |
+——+———+——-+——+——+
7 rows in set (0.00 sec)

mysql>

if table is not having primary key,

number of mappers should be 1.

why?

when  multiple mappers mappers initiated,
first mapper automatically points to begining  record of the first split. remaining mappers can not point to begining of their splits randomly. bcoz there is no primary key.

so only sequential reading is allowed from beginning of table to end of the table.

to make begining to ending as one split,
only one mapper should be intiated.

-m 1.

______________________

[training@localhost ~]$ sqoop import  –connect  jdbc:mysql://localhost/practice  –username root   –table damp -m 1  –target-dir sqimp4

[training@localhost ~]$ hadoop fs -ls sqimp4
Found 3 items
-rw-r–r–   1 training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_SUCCESS
drwxr-xr-x   – training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_logs
-rw-r–r–   1 training supergroup        146 2016-06-01 19:58 /user/training/sqimp4/part-m-00000
[training@localhost ~]$ hadoop fs -cat sqimp4/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

_______________________________

to filter rows at the time importing..

[training@localhost ~]$ sqoop import  –connect  jdbc:mysql://localhost/practice  –username root   –table damp -m 1 –where ‘sal<50000’ –target-dir sqimp5

[training@localhost ~]$ hadoop fs -cat sqimp5/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

______________________

[training@localhost ~]$ sqoop import  –connect  jdbc:mysql://localhost/practice  –username root   –table damp -m 1 –where ‘sex=”m”‘ –target-dir sqimp6

[training@localhost ~]$ hadoop fs -cat sqimp6/part-m-00000
101,aaaa,10000,m,12
105,addda,70000,m,12
106,adddda,80000,m,11
[training@localhost ~]$

_______________________

importing selective columns….

[training@localhost ~]$ sqoop import  –connect  jdbc:mysql://localhost/practice  –username root   –table damp -m 1 –columns name,sal,dno –target-dir sqimp7

[training@localhost ~]$ hadoop fs -cat sqimp7/part-m-00000
aaaa,10000,12
addaaa,20000,12
ada,20000,13
ada,50000,11
addda,70000,12
adddda,80000,11
xadddda,70000,12
[training@localhost ~]$

Continue reading