進階二

1 JSON（JavaScript Object Notation）
- 1.1 由 JSON 字串轉換為字典格式
- 1.2 由字典格式轉換為 JSON 字串
2 網頁擷取
3 大數據分析
4 turtle (海龜)
- 4.1 安裝 turtle
- 4.2 ipyturtle
5 搭積木學 Python
6 資料庫
- 6.1 連接單機版資料庫
- 6.2 連接雲端版資料庫

JSON（JavaScript Object Notation）¶

JSON 為一種將結構化資料呈現為 JavaScript 物件的標準格式，常用於網站上的資料呈現與傳輸
JSON 格式與 Python 的字典(Dict)類似，彼此可以互轉

由 JSON 字串轉換為字典格式¶

import json

# JSON格式字串
json格式 = '{ "名字":"强森", "年齡":30, "城市":"新竹市"}'

#  轉換 json 為 Python 字典格式
字典格式 = json.loads(json格式)

# 輸出結果
print(字典格式)

{'名字': '强森', '年齡': 30, '城市': '新竹市'}

由字典格式轉換為 JSON 字串¶

import json

# Python字典（索引不建議使用中文）
字典格式 = {
    "name":"强森", 
    "age":30, 
    "city":"新竹市"
}

# 轉換 字典格式 為 JSON 字串
JSON = json.dumps(字典格式)

# 輸出結果(中文會轉爲 Unicode 編碼)
print(JSON)

{"name": "\u5f3a\u68ee", "age": 30, "city": "\u65b0\u7af9\u5e02"}

網頁擷取¶

Beautiful Soup 模組¶

# 引入 Beautiful Soup 模組
from bs4 import BeautifulSoup

# 原始 HTML 程式碼
html = """
<html><head><title>樂活學程式</title></head>
<body>
<a id="link1" href="http://ewin.tw/python">樂活學 Python</a>
</body></html>
"""

# 以 Beautiful Soup 解析 HTML 程式碼
解析結果=BeautifulSoup(html, 'html.parser')

# 輸出排版後的 HTML 程式碼
print(解析結果.prettify())

<html>
 <head>
  <title>
   樂活學程式
  </title>
 </head>
 <body>
  <a href="http://ewin.tw/python" id="link1">
   樂活學 Python
  </a>
 </body>
</html>

Requests 模組¶

Python 擷取網頁的資料，最基本就是以 requests 模組建立 HTTP 請求，再由網頁伺服器下載指定的資料：

# 引入 Beautiful Soup 模組
import requests
from bs4 import BeautifulSoup

# 原始 HTML 程式碼
html = requests.get("http://ewin.tw/python/樂活學程式.html").content

# 以 Beautiful Soup 解析 HTML 程式碼
解析結果=BeautifulSoup(html, 'html.parser')

# 輸出排版後的 HTML 程式碼
print(解析結果.prettify())

# 網頁的標題文字
print("網頁標題:", 解析結果.title.string)

# 網頁中的超鏈結
print("網頁中的超鏈結:")
print(解析結果.a)

<html>
 <head>
  <title>
   樂活學程式
  </title>
 </head>
 <body>
  <a href="http://ewin.tw/python" id="link1">
   樂活學 Python
  </a>
 </body>
</html>
網頁標題: 
   樂活學程式
  
網頁中的超鏈結:
<a href="http://ewin.tw/python" id="link1">
   樂活學 Python
  </a>

截取台灣彩券威力彩號碼¶

from bs4 import BeautifulSoup
import requests

# 台灣彩券網址
網址 = 'https://www.taiwanlottery.com.tw/'

# GET 請求
html = requests.get(網址)

# 使用 html 解析器
解析結果 = BeautifulSoup(html.text, 'html.parser')

# 威力彩號碼在第一筆 class="contents_box02" 
資料標籤集 = 解析結果.select('.contents_box02')

# 號碼在 class="ball_tx ball_green" 的 div 區塊
號碼集 = 資料標籤集[0].find_all('div', {'class': 'ball_tx ball_green'})

# 威力彩開獎號碼
print("威力彩開獎號碼: ", end='')
for i in range(6):
    print(號碼集[i].text, end=' ')

威力彩開獎號碼: 12  25  20  27  06  37

新北市YouBike¶

新北市 YouBike
資料來源網址： http://ewin.tw/myApp/sample/UBike/
- 截取 JSON 資料中的 sna (中文場站名稱)及 ar(中文地址)欄位
- 資料可透過 Json 解譯器來解譯

import requests
import json

# 開放資料(Open Data)來源
網址 = 'http://ewin.tw/myApp/sample/UBike/'

# 發出 HTTP GET 請求
結果 = requests.get(網址)

# 將回傳結果轉換成 JSON 格式
資料集 = json.loads(結果.text)

# 截取資料中的 records
資料集 = 資料集['result']['records']

# 篩選前 10 筆資料
資料集 = 資料集[0:10]

# 截取資料，並建立清單
清單=[]
for 資料 in 資料集:
    清單.append(資料['sna']+' : '+資料['ar'])
    
# 輸出新北市YouBike
print('新北市 YouBike')
print('-------------')
print("\n".join(清單))

新北市 YouBike
-------------
大鵬華城 : 新北市新店區中正路700巷3號
汐止火車站 : 南昌街/新昌路口(西側廣場)
汐止區公所 : 新台五路一段/仁愛路口(新台五路側汐止地政事務所前機車停車場)
國泰綜合醫院 : 建成路78號對面停車場
裕隆公園 : 寶中路/品牌路口(東南側)
捷運大坪林站(5號出口) : 中興路三段224號(對面)
汐科火車站(北) : 大同路二段184巷/龍安路202巷(西側)(汐科火車站北站出口前)
興華公園 : 重陽路一段120巷/中華路2巷
三重國民運動中心 : 集美街/重新路四段184巷
捷運三重站(3號出口) : 捷運路/捷運路37巷

大數據分析¶

pandas¶

大熊貓 (pandas)是Python進行數據處理和分析編寫的套件，它提供了數據結構、操作數表、時間序列等功能，是 Python 中的 Excel。

Pandas 的中文顯示¶

Pandas 圖形預設無法顯示中文，請執行下列程式，解決無法顯示中文的問題。

from pylab import mpl

mpl.rcParams['font.sans-serif'] = ['Microsoft YaHei']  

# 指定預設字形
mpl.rcParams['axes.unicode_minus'] = False

# 避免圖形在新視窗呈現。
%matplotlib inline

一維度欄位 Series¶

import pandas as pd

data=range(1,5)
index=list('abcd')

s=pd.Series(data, index)
print(s)

a    1
b    2
c    3
d    4
dtype: int64

print(s[0])
print(s['a'])

1
1

Series 的方法¶

s+2

a    3
b    4
c    5
d    6
dtype: int64

s*2

a    2
b    4
c    6
d    8
dtype: int64

# 總和
s.sum()

10

# 平均
s.mean()

2.5

# 標準差
s.std()

1.2909944487358056

# 最大（小）值
s.max()

4

s.min()

1

# 資料數量
s.count()

4

s.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x2904f6c2408>

二維度欄位 DataFrame¶

採用清單建立資料¶

import pandas as pd

datas = [[90,50,70,80],
         [60,70,90,50],
         [33,75,88,60],
         [22,58,66,37]]

df = pd.DataFrame(datas)
df

採用字典建立資料¶

import pandas as pd

datas = {'國文':[90,50,70,80],
         '英文':[60,70,90,50],
         '數學':[33,75,88,60],
         '自然':[22,58,66,37]}

df = pd.DataFrame(datas)
df

加入列索引¶

import pandas as pd

datas = {'國文':[90,50,70,80],
         '英文':[60,70,90,50],
         '數學':[33,75,88,60],
         '自然':[22,58,66,37]}
index = ['小明','小智','小王','小慧']

df = pd.DataFrame(datas, index)
df

加入列及欄索引¶

import pandas as pd

datas=[[90,60,55,33],
       [50,70,75,58],
       [70,90,88,66],
       [80,50,60,88]]
index = ['小明','小智','小王','小慧']
columns=['國文','英文','數學','自然']

df = pd.DataFrame(datas, index=index, columns=columns)
df

DataFrame 的方法¶

# 總和
df.sum()

國文    290
英文    270
數學    278
自然    245
dtype: int64

# 平均
df.mean()

國文    72.50
英文    67.50
數學    69.50
自然    61.25
dtype: float64

# 標準差
df.std()

國文    17.078251
英文    17.078251
數學    14.977761
自然    22.706460
dtype: float64

# 最大（小）值
df.max()

國文    90
英文    90
數學    88
自然    88
dtype: int64

df.min()

國文    50
英文    50
數學    55
自然    33
dtype: int64

資料處理 apply()¶

def 資料處理(x):
    if x >= 60:
        return '及格'
    else:
        return '不及格'

df['國文'].apply(資料處理)

小明     及格
小智    不及格
小王     及格
小慧     及格
Name: 國文, dtype: object

繪圖¶

折線圖¶

df.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x2904fa03488>

直方圖¶

df.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x2904fa0e4c8>

df.T

df.T.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x2904fafb648>

df[['國文','英文','數學','自然']].plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x29050b731c8>

範例5

df.plot(kind='bar', stacked=True)

<matplotlib.axes._subplots.AxesSubplot at 0x29050c0d708>

範例6

df['數學'].plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x29050cab9c8>

範例7

df['數學'].plot(kind='bar', rot = 45)

<matplotlib.axes._subplots.AxesSubplot at 0x29050d4f6c8>

範例8

df['數學'].sort_values(ascending=False).\
plot(kind='bar', rot = 45)

<matplotlib.axes._subplots.AxesSubplot at 0x29050e07d88>

範例9

df['數學'].plot(kind='barh')

<matplotlib.axes._subplots.AxesSubplot at 0x29050e73b48>

圓餅圖（pie chart）¶

df['數學'].plot(kind='pie')

<matplotlib.axes._subplots.AxesSubplot at 0x29050e8f588>

範例11

explode = [0.15, 0, 0, 0]
df['數學'].plot(kind='pie',explode=explode)

<matplotlib.axes._subplots.AxesSubplot at 0x29050f0ba48>

範例12

df['數學'].plot(kind='pie' ,autopct='%.1f%%')

<matplotlib.axes._subplots.AxesSubplot at 0x29050f845c8>

範例13

idx = df['數學'].idxmin()
[0.15 if i == idx else 0 for i in df.index]

[0.15, 0, 0, 0]

範例14

idx = df['數學'].idxmin()
print(f'分數最低的人是{idx}')

explode = [0.15 if i==idx else 0 for i in df.index]
df['數學'].plot(kind='pie',explode=explode,autopct='%1.1f%%')

分數最低的人是小明

<matplotlib.axes._subplots.AxesSubplot at 0x29050fe0ac8>

matplotlib 套件¶

Matplotlib 圖形預設無法顯示中文，請執行下列程式，解決無法顯示中文的問題。

from pylab import mpl

mpl.rcParams['font.sans-serif'] = ['Microsoft YaHei']  

# 指定預設字形
mpl.rcParams['axes.unicode_minus'] = False

# 避免圖形在新視窗呈現。
%matplotlib inline

import matplotlib.pyplot as plt

x = [1,2,3]
y = [1,2,3]
plt.plot(x, y)
plt.show()

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 產生日期序列
x = pd.period_range(pd.datetime.now(), periods=200, freq='d')

# 轉爲時間戳記，再轉爲 python 日期
x = x.to_timestamp().to_pydatetime()

# 產生三組各 200 個隨機分布元素
y = np.random.randn(200, 3).cumsum(0)

# 顯示圖像
plt.plot(x, y)
plt.show()

直方圖（Histogram）¶

import numpy as np
import matplotlib.pyplot as plt

# 生成 100 組標準常態分配（平均值為 0，標準差為 1 的常態分配）隨機變數
normal_samples = np.random.normal(size=100) 

# 繪製直方圖
plt.hist(normal_samples, width=0.1)
plt.show()

散佈圖（Scatter plot）¶

import numpy as np
import matplotlib.pyplot as plt

點數 = 100
梯度 = 0.5

# 產生序列
x = np.array(range(點數))

# 產生隨機亂數
y = np.random.randn(點數) * 10 + x * 梯度

# 繪製圖形
fig, ax = plt.subplots(figsize=(8, 4))
ax.scatter(x, y)

fig.suptitle('簡單的散佈圖')
plt.show()

import numpy as np
import matplotlib.pyplot as plt

N = 50
x = np.random.rand(N)
y = np.random.rand(N)

顔色 = np.random.rand(N)
面積 = np.pi * (15 * np.random.rand(N))**2

plt.scatter(x, y, s=面積, c=顔色, alpha=0.5)
plt.show()

圓餅圖（Pie plot）¶

import numpy as np
import matplotlib.pyplot as plt

# 產生5筆隨機亂數
data = np.random.randint(1, 11, 5)

# 繪製圖形
plt.pie(data)
plt.show()

import matplotlib.pyplot as plt

plt.figure()
plt.subplot(111, projection="aitoff")
plt.title("Aitoff")
plt.grid(True)

import matplotlib.pyplot as plt

plt.figure()
plt.subplot(111, projection="lambert")
plt.title("Lambert")
plt.grid(True)

Seaborn 套件　¶

Seaborn 套件是以 matplotlib 為基礎的高階繪圖套件，讓使用者更加輕鬆地建立圖表。

直方圖¶

從seaborn裡載入titanic的資料

import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('titanic')
df.head()

df['age'].plot(kind='hist')

df['age'].plot(kind='hist',bins=30)

pd.cut(df['age'],30).head()

(pd.cut(df['age'],30,labels=False). 
 value_counts(). # 算出每個區間的個數
 sort_index().
 plot(kind='bar', width=1))

df['age'].value_counts(bins=30).sort_index().plot(kind='bar', width=1)

df['age'].plot(kind='box')

df.groupby(['survived','sex']).size().unstack(1)

df.groupby(['survived','sex']).size().unstack(1).\
plot(kind='bar')

df.groupby('survived')['age'].\
plot(kind='hist', bins=30, alpha=0.5, legend=True)

df.groupby('survived')['age'].plot(kind='box')

df.boxplot(column='age', by='survived')

sns.boxplot(x = 'survived', y = 'age', data=df)

sns.boxplot(x = 'survived', y = 'age', data=df, hue='sex')

# 不知為何有奇怪warning。下兩行可取消warning顯示。  
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
sns.violinplot(x = 'survived', y = 'age', data=df)

sns.violinplot(x = 'survived', y = 'age', data=df, hue='sex', split=True)

df.plot(kind='scatter', x='age', y='fare')

df.plot(kind='scatter', x='age', y='fare', c='survived', cmap='coolwarm')

df.plot(kind='scatter', x='age', y='fare',
        c='survived', cmap='coolwarm', colorbar=False)

df.plot(kind='scatter', x='age', y='fare', c = 'survived',
        s=((df['survived']+0.5)*30), alpha=0.8, cmap='coolwarm')

plt.style.use('seaborn')
df.plot(kind='scatter', x='age', y='fare')

print(plt.style.available)

plt.style.use('classic')
df.plot(kind='scatter', x='age', y='fare',figsize=(6,4))

plt.rcParams.update(plt.rcParamsDefault)
%matplotlib inline
df.plot(kind='scatter', x='age', y='fare')

turtle (海龜繪圖模組)¶

安裝 turtle¶

pip install turtle

# 載入turtle模組
import turtle as tu

# 顯示畫布
tu.showturtle()

# 畫右半部直線
tu.color('blue')
tu.forward(150)

# 畫右半部的空心圓
tu.setheading(270)
tu.color('red')
tu.pensize(2)
tu.circle(50)

# 回到中心點
tu.pensize(1)
tu.color('blue')
tu.setheading(180)
tu.penup()
tu.forward(150)
tu.pendown()

# 畫左半部直線
tu.forward(150)
tu.setheading(90)

# 畫左半部空心圓
tu.color('red')
tu.pensize(2)
tu.circle(50)
tu.pensize(1)

# 回到中心點
tu.setheading(0)
tu.penup()
tu.forward(150)

# 畫中央實心三角形
tu.setheading(180)
tu.color('green')
tu.pendown()
tu.begin_fill()
tu.circle(50, 360, 3)
tu.end_fill()

ipyturtle¶

turtle 是Python的一種功能，例如繪圖板，它使我們可以命令海龜在其上進行繪製。在jupyter筆記本中運行海龜塊會出現不支持的python圖形的問題。要將烏龜輸出保存在jupyter筆記本中，請在 Jupyter 命令列執行以下步驟：

pip install ipyturtle
jupyter nbextension enable --py --sys-prefix ipyturtle

參考網站：https://github.com/gkvoelkl/ipython-turtle-widget

from ipyturtle import Turtle
tu = Turtle()
tu

tu.forward(150)

搭積木學 Python¶

資料庫¶

連接單機版資料庫¶

請採用離線版 Jupyter 執行

參閲資料

import pymongo
client = pymongo.MongoClient(host='localhost', port=27017)
db = client['test']
collection = db['students']
print(collection)

student1 = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

student2 = {
    'id': '20170202',
    'name': 'Mike',
    'age': 21,
    'gender': 'male'
}

result = collection. insert_many([student1, student2])

results = collection.find({'age': 20})
for result in results:
    print(result)
print("end")

連接雲端版資料庫¶

參閲資料

"Connect to Your Cluster"

注意事項

連線密碼為 Database Access 的 User密碼，必須要作 URL encoded
產生連線字串時 Python 版本要選 3.4，若選3.6則需安裝 pip install dnspython
Network Access 加入 whitelist 0.0.0.0
Access Management 加入 Project Cluster Manager、Data Access Read/Write

# 資料庫

## 連接單機版資料庫
**<span style="color:red;">請採用離線版 Jupyter 執行</span>**

參閲資料
- [Mongodb](https://cloud.mongodb.com/)
- ["Python操作MongoDB看这一篇就够了"](https://juejin.im/post/5addbd0e518825671f2f62ee)
- [MongoDB - 詳細到讓人牙起來的安裝教學](https://dotblogs.com.tw/explooosion/2018/01/21/040728)

import pymongo
client = pymongo.MongoClient(host='localhost', port=27017)
db = client['test']
collection = db['students']
print(collection)

student1 = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

student2 = {
    'id': '20170202',
    'name': 'Mike',
    'age': 21,
    'gender': 'male'
}

result = collection. insert_many([student1, student2])

results = collection.find({'age': 20})
for result in results:
    print(result)
print("end")

## 連接雲端版資料庫

參閲資料
- ["Connect to Your Cluster"](https://docs.atlas.mongodb.com/tutorial/connect-to-your-cluster/index.html)

注意事項
- 連線密碼為 Database Access 的 User密碼，必須要作 [URL encoded](https://ascii.cl/url-encoding.htm)
- 產生連線字串時 Python 版本要選 3.4，若選3.6則需安裝 pip install dnspython
- Network Access 加入 whitelist 0.0.0.0
- Access Management 加入 Project Cluster Manager、Data Access Read/Write

import datetime
import pymongo


# Connect to cluster
client = pymongo.MongoClient("mongodb://csliu68:liu11090@cluster0-shard-00-00-yrjpa.mongodb.net:27017,cluster0-shard-00-01-yrjpa.mongodb.net:27017,cluster0-shard-00-02-yrjpa.mongodb.net:27017/test?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin&retryWrites=true&w=majority")

# Create a new database on cluster
db = client.test

# Create a collection for database
collection = db.students

student1 = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

student2 = {
    'id': '20170202',
    'name': 'Mike',
    'age': 21,
    'gender': 'male'
}

# Insert datas into collection
result = collection. insert_many([student1, student2])

# Search datas
results = collection.find({'age': 20})
for result in results:
    print(result)

進階二

JSON（JavaScript Object Notation）¶

由 JSON 字串轉換為字典格式¶

由字典格式 轉換為 JSON 字串¶

網頁擷取¶

Beautiful Soup 模組¶

Requests 模組¶

截取台灣彩券威力彩號碼¶

新北市YouBike¶

大數據分析¶

pandas¶

Pandas 的中文顯示¶

一維度欄位 Series¶

Series 的方法¶

二維度欄位 DataFrame¶

採用清單建立資料¶

採用字典建立資料¶

加入列索引¶

加入列及欄索引¶

DataFrame 的方法¶

資料處理 apply()¶

繪圖¶

折線圖¶

直方圖¶

圓餅圖（pie chart）¶

matplotlib 套件¶

直方圖（Histogram）¶

散佈圖（Scatter plot）¶

圓餅圖（Pie plot）¶

Seaborn 套件 ¶

直方圖¶

turtle (海龜繪圖模組)¶

安裝 turtle¶

ipyturtle¶

搭積木學 Python¶

資料庫¶

連接單機版資料庫¶

連接雲端版資料庫¶

由字典格式轉換為 JSON 字串¶

Seaborn 套件　¶