2021. 8. 13. 11:28ㆍ꿀팁 분석 환경 설정/파이썬 개발 팁
목차
예전 Numba 관련 글
https://data-newbie.tistory.com/390
EX) Montecarlo Method
import random
from numba import jit
@jit(nopython=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
def monte_carlo_pi_no_numba(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
%time monte_carlo_pi_no_numba(10000)
#CPU times: user 5.53 ms, sys: 808 µs, total: 6.34 ms
#Wall time: 6.07 ms
%time monte_carlo_pi(10000)
%time monte_carlo_pi(10000)
#CPU times: user 329 ms, sys: 40 ms, total: 369 ms
#Wall time: 451 ms
#CPU times: user 61 µs, sys: 45 µs, total: 106 µs
#Wall time: 109 µs
- 성능차이가 나는 것을 알 수 있다.
잘못된 경우
def original_function(input_list):
output_list = []
for item in input_list:
if item % 2 == 0:
output_list.append(2)
else:
output_list.append('1')
return output_list
test_list = list(range(100000))
original_function(test_list)[0:10]
- 아래 코드를 사용하면 에러가 발생한다.
jitted_function = jit()(original_function)
jitted_function(test_list)[0:10]
<ipython-input-5-bff7290ad327>:1: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "original_function" failed type inference due to: Invalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
During: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)
During: typing of call at <ipython-input-5-bff7290ad327> (7)
File "<ipython-input-5-bff7290ad327>", line 7:
def original_function(input_list):
<source elided>
else:
output_list.append('1')
^
def original_function(input_list):
<ipython-input-5-bff7290ad327>:1: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "original_function" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>
File "<ipython-input-5-bff7290ad327>", line 3:
def original_function(input_list):
<source elided>
output_list = []
for item in input_list:
^
show more (open the raw output data in a text editor) ...
output_list = []
for item in input_list:
^
warnings.warn(errors.NumbaDeprecationWarning(msg,
[2, '1', 2, '1', 2, '1', 2, '1', 2, '1']
리스트안에 타입이 일치할 필요가 있다. 실제로 작동은 하지만, 속도 개선은 없는 것을 알 수 있다.
해당 결과에서는 더 느리게 나온 것을 알 수 있다.
%time _ = original_function(test_list)
CPU times: user 18.9 ms, sys: 0 ns, total: 18.9 ms
Wall time: 18.7 ms
%time _ = jitted_function(test_list)
CPU times: user 29.9 ms, sys: 734 µs, total: 30.6 ms
Wall time: 30.4 ms
이렇게 작동하지 않는 경우에도 작동하게 해놔서 numba를 쓰게는 할 수 있지만, 속도 개선은 안된다는거!
njitted_function = njit()(original_function)
njitted_function(test_list)[0:5]
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1m[1m[1mInvalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
[0m
[0m[1mDuring: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)[0m
[0m[1mDuring: typing of call at <ipython-input-5-bff7290ad327> (7)
[0m
[1m
File "<ipython-input-5-bff7290ad327>", line 7:[0m
[1mdef original_function(input_list):
<source elided>
else:
[1m output_list.append('1')
[0m [1m^[0m[0m
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
<ipython-input-12-61341abb3f42> in <module>
1 njitted_function = njit()(original_function)
----> 2 njitted_function(test_list)[0:5]
/opt/conda/lib/python3.8/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
412 e.patch_message(msg)
413
--> 414 error_rewrite(e, 'typing')
415 except errors.UnsupportedError as e:
416 # Something unsupported is present in the user code, add help info
/opt/conda/lib/python3.8/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
355 raise e
356 else:
--> 357 raise e.with_traceback(None)
358
359 argtypes = []
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
During: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)
During: typing of call at <ipython-input-5-bff7290ad327> (7)
File "<ipython-input-5-bff7290ad327>", line 7:
def original_function(input_list):
<source elided>
else:
output_list.append('1')
^
list에다가 numba를 쓰면 더 느려진다는 것을 알 수 있다.
list를 np.array로 바꿨을 뿐인데 ,성능 향상됨.
타입이 2개가 다른 것도 지원은 하나 속도는 여전히 느리다. 심지어(2,1.5)이것도 (int,float)이라서 속도에 영향을 줄 수 있음.
Creating and returning lists from JIT-compiled functions is supported, as well as all methods and operations. Lists must be strictly homogeneous: Numba will reject any list containing objects of different types, even if the types are compatible (for example, [1, 2.5] is rejected as it contains a int and a float).
# https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#list
def sane_function(input_list):
output_list = []
for item in input_list:
if item % 2 == 0:
output_list.append(2)
else:
output_list.append(1)
return output_list
test_list = list(range(100000))
%time sane_function(test_list)[0:5]
CPU times: user 15.8 ms, sys: 0 ns, total: 15.8 ms
Wall time: 15.6 ms
[2, 1, 2, 1, 2]
njitted_sane_function = njit()(sane_function)
%time njitted_sane_function(test_list)[0:5]
/opt/conda/lib/python3.8/site-packages/numba/core/ir_utils.py:2067: NumbaPendingDeprecationWarning:
Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'input_list' of function 'sane_function'.
For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types
File "<ipython-input-13-9a7e18fa2d25>", line 1:
def sane_function(input_list):
^
warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))
CPU times: user 290 ms, sys: 3.88 ms, total: 293 ms
Wall time: 291 ms
[2, 1, 2, 1, 2]
import numpy as np
test_list = np.arange(100000)
%time njitted_sane_function(test_list)[0:5]
#CPU times: user 2.33 ms, sys: 0 ns, total: 2.33 ms
#Wall time: 7.9 ms
#[2, 1, 2, 1, 2]
vectorize
아래 결과 똑같은 함수도 처음할 때랑 두번째랑 다름을 알 수 있다.
첫번째 호출에서 함수는 실제로 컴파일되고 있으므로 더 오래걸림
두번째 호출에서는 최종적으로 얻을 수 있는 극도의 속도 향상을 볼 수 있습니다.
이는 적절한 크기의 출력 목록이 미리 할당되도록 하므로 목록이 알 수 없는 크기로 커지던 과거 형태의 함수에 대한 최적화입니다. . 출력 배열을 먼저 할당하여 원래 함수에서 이 문제를 해결할 수 있습니다.
@vectorize(nopython=True)
def non_list_function(item):
if item % 2 == 0:
return 2
else:
return 1
%time non_list_function(test_list)
CPU times: user 68.9 ms, sys: 3.9 ms, total: 72.8 ms
Wall time: 72.2 ms
array([2, 1, 2, ..., 1, 2, 1])
%time non_list_function(test_list)
CPU times: user 0 ns, sys: 539 µs, total: 539 µs
Wall time: 309 µs
array([2, 1, 2, ..., 1, 2, 1])
예제) spring mass
from IPython.display import Image
Image('https://upload.wikimedia.org/wikipedia/commons/f/fa/Spring-mass_under-damped.gif')
# Let's mix wet friction with dry friction, this makes the behavior
# of the system dependent on the initial condition, something
# may be interesting to study by running an exhaustive simluation
def friction_fn(v, vt):
if v > vt:
return - v * 3
else:
return - vt * 3 * np.sign(v)
def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
times = np.arange(0, T, dt)
positions = np.zeros_like(times)
v = 0
a = 0
x = x0
positions[0] = x0/x0
for ii in range(len(times)):
if ii == 0:
continue
t = times[ii]
a = friction_fn(v, vt) - 100*x
v = v + a*dt
x = x + v*dt
positions[ii] = x/x0
return times, positions
import matplotlib.pyplot as plt
plt.plot(*simulate_spring_mass_funky_damper(0.1))
plt.plot(*simulate_spring_mass_funky_damper(1))
plt.plot(*simulate_spring_mass_funky_damper(10))
plt.legend(['0.1', '1', '10'])
%time _ = simulate_spring_mass_funky_damper(0.1)
CPU times: user 267 ms, sys: 2.78 ms, total: 269 ms
Wall time: 268 ms
@njit
def friction_fn(v, vt):
if v > vt:
return - v * 3
else:
return - vt * 3 * np.sign(v)
@njit
def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
times = np.arange(0, T, dt)
positions = np.zeros_like(times)
v = 0
a = 0
x = x0
positions[0] = x0/x0
for ii in range(len(times)):
if ii == 0:
continue
t = times[ii]
a = friction_fn(v, vt) - 100*x
v = v + a*dt
x = x + v*dt
positions[ii] = x/x0
return times, positions
_ = simulate_spring_mass_funky_damper(0.1)
%time _ = simulate_spring_mass_funky_damper(0.1)
CPU times: user 931 µs, sys: 209 µs, total: 1.14 ms
Wall time: 1.16 ms
기존 코드에서 jit만 붙였을 뿐인데, 속도가 거의 200배가 차이나게 나온다..
흠, 그것은 실제로 더 빨라 보이지 않고 htop을 보면 모든 코어가 사용되는 것처럼 보이지 않는다고 합니다ㅣ.
이는 기본적으로 Numba 함수가 전역 인터프리터 잠금(GIL)을 해제하지 않기 때문이라고 합니다.
GIL(NOGIL)
%%time
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(8) as ex:
ex.map(simulate_spring_mass_funky_damper, np.arange(0, 1000, 0.1))
CPU times: user 10.1 s, sys: 173 ms, total: 10.2 s
Wall time: 2.52 s
Parallel
Numba는 실제로 기본적으로 코드를 다중 처리할 수 있지만 이 경우 래퍼 함수를 정의해야 합니다.
from numba import prange
@njit(nogil=True, parallel=True)
def run_sims(end=1000):
for x0 in prange(int(end/0.1)):
if x0 == 0:
continue
simulate_spring_mass_funky_damper(x0*0.1)
run_sims()
CPU times: user 10.5 s, sys: 12.4 ms, total: 10.5 s
Wall time: 271 ms
우연히 보게 됬는데, 연습해야할 것 같아서 일단 올림
https://www.youtube.com/watch?v=x58W9A2lnQc&ab_channel=JackofSome
https://gist.github.com/safijari/fa4eba922cea19b3bc6a693fe2a97af7
'꿀팁 분석 환경 설정 > 파이썬 개발 팁' 카테고리의 다른 글
vscode) line length 늘리기 (0) | 2021.09.05 |
---|---|
Python) Database 관련 자료 (0) | 2021.09.04 |
Python) 파이썬 프로젝트를 패키지화하기(setup.py) (2) | 2021.07.27 |
디버깅) 파이썬 코드 실행 시각화 또는 추적을 하는 3가지 도구 (1) | 2021.05.08 |
디버깅) CyberBrain (0) | 2021.05.08 |