Python) Numba 예제 (TODO)

2021. 8. 13. 11:28꿀팁 분석 환경 설정/파이썬 개발 팁

728x90

목차

    예전 Numba 관련 글

    https://data-newbie.tistory.com/390

     

     

    EX) Montecarlo Method

    Montecarlo Method

    import random
    from numba import jit
    
    @jit(nopython=True)
    def monte_carlo_pi(nsamples):
        acc = 0
        for i in range(nsamples):
            x = random.random()
            y = random.random()
            if (x ** 2 + y ** 2) < 1.0:
                acc += 1
        return 4.0 * acc / nsamples
    
    def monte_carlo_pi_no_numba(nsamples):
        acc = 0
        for i in range(nsamples):
            x = random.random()
            y = random.random()
            if (x ** 2 + y ** 2) < 1.0:
                acc += 1
        return 4.0 * acc / nsamples
        
    %time monte_carlo_pi_no_numba(10000)
    #CPU times: user 5.53 ms, sys: 808 µs, total: 6.34 ms
    #Wall time: 6.07 ms
    
    %time monte_carlo_pi(10000)
    %time monte_carlo_pi(10000)
    #CPU times: user 329 ms, sys: 40 ms, total: 369 ms
    #Wall time: 451 ms
    #CPU times: user 61 µs, sys: 45 µs, total: 106 µs
    #Wall time: 109 µs
    • 성능차이가 나는 것을 알 수 있다. 

    잘못된 경우

    def original_function(input_list):
        output_list = []
        for item in input_list:
            if item % 2 == 0:
                output_list.append(2)
            else:
                output_list.append('1')
        return output_list
    
    test_list = list(range(100000))
    original_function(test_list)[0:10]
    • 아래 코드를 사용하면 에러가 발생한다.
    jitted_function = jit()(original_function)
    jitted_function(test_list)[0:10]
    <ipython-input-5-bff7290ad327>:1: NumbaWarning: 
    Compilation is falling back to object mode WITH looplifting enabled because Function "original_function" failed type inference due to: Invalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
    
    During: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)
    During: typing of call at <ipython-input-5-bff7290ad327> (7)
    
    
    File "<ipython-input-5-bff7290ad327>", line 7:
    def original_function(input_list):
        <source elided>
            else:
                output_list.append('1')
                ^
    
      def original_function(input_list):
    <ipython-input-5-bff7290ad327>:1: NumbaWarning: 
    Compilation is falling back to object mode WITHOUT looplifting enabled because Function "original_function" failed type inference due to: Cannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>
    
    File "<ipython-input-5-bff7290ad327>", line 3:
    def original_function(input_list):
        <source elided>
        output_list = []
        for item in input_list:
        ^
    show more (open the raw output data in a text editor) ...
    
        output_list = []
        for item in input_list:
        ^
    
      warnings.warn(errors.NumbaDeprecationWarning(msg,
    [2, '1', 2, '1', 2, '1', 2, '1', 2, '1']

     

    리스트안에 타입이 일치할 필요가 있다. 실제로 작동은 하지만, 속도 개선은 없는 것을 알 수 있다.
    해당 결과에서는 더 느리게 나온 것을 알 수 있다.
    %time _ = original_function(test_list)
    CPU times: user 18.9 ms, sys: 0 ns, total: 18.9 ms
    Wall time: 18.7 ms
    %time _ = jitted_function(test_list)
    CPU times: user 29.9 ms, sys: 734 µs, total: 30.6 ms
    Wall time: 30.4 ms

    이렇게 작동하지 않는 경우에도 작동하게 해놔서 numba를 쓰게는 할 수 있지만, 속도 개선은 안된다는거!

    njitted_function = njit()(original_function)
    njitted_function(test_list)[0:5]
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    Invalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
    
    During: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)
    During: typing of call at <ipython-input-5-bff7290ad327> (7)
    
    
    File "<ipython-input-5-bff7290ad327>", line 7:
    def original_function(input_list):
        <source elided>
            else:
                output_list.append('1')
                ^
    ---------------------------------------------------------------------------
    TypingError                               Traceback (most recent call last)
    <ipython-input-12-61341abb3f42> in <module>
          1 njitted_function = njit()(original_function)
    ----> 2 njitted_function(test_list)[0:5]
    
    /opt/conda/lib/python3.8/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
        412                 e.patch_message(msg)
        413 
    --> 414             error_rewrite(e, 'typing')
        415         except errors.UnsupportedError as e:
        416             # Something unsupported is present in the user code, add help info
    
    /opt/conda/lib/python3.8/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
        355                 raise e
        356             else:
    --> 357                 raise e.with_traceback(None)
        358 
        359         argtypes = []
    
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    Invalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
    
    During: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)
    During: typing of call at <ipython-input-5-bff7290ad327> (7)
    
    
    File "<ipython-input-5-bff7290ad327>", line 7:
    def original_function(input_list):
        <source elided>
            else:
                output_list.append('1')
                ^

    list에다가 numba를 쓰면 더 느려진다는 것을 알 수 있다. 

    list를 np.array로 바꿨을 뿐인데 ,성능 향상됨.

    타입이 2개가 다른 것도 지원은 하나 속도는 여전히 느리다. 심지어(2,1.5)이것도 (int,float)이라서 속도에 영향을 줄 수 있음.

    Creating and returning lists from JIT-compiled functions is supported, as well as all methods and operations. Lists must be strictly homogeneous: Numba will reject any list containing objects of different types, even if the types are compatible (for example, [1, 2.5] is rejected as it contains a int and a float).
    # https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#list
    def sane_function(input_list):
        output_list = []
        for item in input_list:
            if item % 2 == 0:
                output_list.append(2)
            else:
                output_list.append(1)
        return output_list
    
    test_list = list(range(100000))
    %time sane_function(test_list)[0:5]
    CPU times: user 15.8 ms, sys: 0 ns, total: 15.8 ms
    Wall time: 15.6 ms
    [2, 1, 2, 1, 2]
    
    njitted_sane_function = njit()(sane_function)
    %time njitted_sane_function(test_list)[0:5]
    /opt/conda/lib/python3.8/site-packages/numba/core/ir_utils.py:2067: NumbaPendingDeprecationWarning: 
    Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'input_list' of function 'sane_function'.
    
    For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types
    
    File "<ipython-input-13-9a7e18fa2d25>", line 1:
    def sane_function(input_list):
    ^
    
      warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))
    CPU times: user 290 ms, sys: 3.88 ms, total: 293 ms
    Wall time: 291 ms
    [2, 1, 2, 1, 2]
    
    import numpy as np
    test_list = np.arange(100000)
    %time njitted_sane_function(test_list)[0:5]
    #CPU times: user 2.33 ms, sys: 0 ns, total: 2.33 ms
    #Wall time: 7.9 ms
    #[2, 1, 2, 1, 2]

    vectorize

    아래 결과 똑같은 함수도 처음할 때랑 두번째랑 다름을 알 수 있다.

    첫번째 호출에서 함수는 실제로 컴파일되고 있으므로 더 오래걸림

    두번째 호출에서는 최종적으로 얻을 수 있는 극도의 속도 향상을 볼 수 있습니다.

     

    이는 적절한 크기의 출력 목록이 미리 할당되도록 하므로 목록이 알 수 없는 크기로 커지던 과거 형태의 함수에 대한 최적화입니다. . 출력 배열을 먼저 할당하여 원래 함수에서 이 문제를 해결할 수 있습니다.

    @vectorize(nopython=True)
    def non_list_function(item):
        if item % 2 == 0:
            return 2
        else:
            return 1
    %time non_list_function(test_list)
    
    CPU times: user 68.9 ms, sys: 3.9 ms, total: 72.8 ms
    Wall time: 72.2 ms
    array([2, 1, 2, ..., 1, 2, 1])
    
    %time non_list_function(test_list)
    CPU times: user 0 ns, sys: 539 µs, total: 539 µs
    Wall time: 309 µs
    array([2, 1, 2, ..., 1, 2, 1])

    예제) spring mass

    from IPython.display import Image
    
    Image('https://upload.wikimedia.org/wikipedia/commons/f/fa/Spring-mass_under-damped.gif')

     

    # Let's mix wet friction with dry friction, this makes the behavior
    # of the system dependent on the initial condition, something
    # may be interesting to study by running an exhaustive simluation
    
    def friction_fn(v, vt):
        if v > vt:
            return - v * 3
        else:
            return - vt * 3 * np.sign(v)
    
    def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
        times = np.arange(0, T, dt)
        positions = np.zeros_like(times)
        
        v = 0
        a = 0
        x = x0
        positions[0] = x0/x0
        
        for ii in range(len(times)):
            if ii == 0:
                continue
            t = times[ii]
            a = friction_fn(v, vt) - 100*x
            v = v + a*dt
            x = x + v*dt
            positions[ii] = x/x0
        return times, positions
        
    import matplotlib.pyplot as plt
    plt.plot(*simulate_spring_mass_funky_damper(0.1))
    plt.plot(*simulate_spring_mass_funky_damper(1))
    plt.plot(*simulate_spring_mass_funky_damper(10))
    plt.legend(['0.1', '1', '10'])
    
    %time _ = simulate_spring_mass_funky_damper(0.1)
    CPU times: user 267 ms, sys: 2.78 ms, total: 269 ms
    Wall time: 268 ms

     

    @njit
    def friction_fn(v, vt):
        if v > vt:
            return - v * 3
        else:
            return - vt * 3 * np.sign(v)
    
    @njit
    def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
        times = np.arange(0, T, dt)
        positions = np.zeros_like(times)
        
        v = 0
        a = 0
        x = x0
        positions[0] = x0/x0
        
        for ii in range(len(times)):
            if ii == 0:
                continue
            t = times[ii]
            a = friction_fn(v, vt) - 100*x
            v = v + a*dt
            x = x + v*dt
            positions[ii] = x/x0
        return times, positions
    
    _ = simulate_spring_mass_funky_damper(0.1)
    
    
    %time _ = simulate_spring_mass_funky_damper(0.1)
    CPU times: user 931 µs, sys: 209 µs, total: 1.14 ms
    Wall time: 1.16 ms

    기존 코드에서 jit만 붙였을 뿐인데, 속도가 거의 200배가 차이나게 나온다..

    흠, 그것은 실제로 더 빨라 보이지 않고 htop을 보면 모든 코어가 사용되는 것처럼 보이지 않는다고 합니다ㅣ.

    이는 기본적으로 Numba 함수가 전역 인터프리터 잠금(GIL)을 해제하지 않기 때문이라고 합니다.

    GIL(NOGIL)

     

    %%time
    from concurrent.futures import ThreadPoolExecutor
    
    with ThreadPoolExecutor(8) as ex:
        ex.map(simulate_spring_mass_funky_damper, np.arange(0, 1000, 0.1))
        
    CPU times: user 10.1 s, sys: 173 ms, total: 10.2 s
    Wall time: 2.52 s

     

    Parallel

    Numba는 실제로 기본적으로 코드를 다중 처리할 수 있지만 이 경우 래퍼 함수를 ​​정의해야 합니다.

     

    from numba import prange
    @njit(nogil=True, parallel=True)
    def run_sims(end=1000):
        for x0 in prange(int(end/0.1)):
            if x0 == 0:
                continue
            simulate_spring_mass_funky_damper(x0*0.1)
            
    run_sims()
    
    CPU times: user 10.5 s, sys: 12.4 ms, total: 10.5 s
    Wall time: 271 ms

     

     

    우연히 보게 됬는데, 연습해야할 것 같아서 일단 올림

     

     

    https://www.youtube.com/watch?v=x58W9A2lnQc&ab_channel=JackofSome 

     

    https://gist.github.com/safijari/fa4eba922cea19b3bc6a693fe2a97af7

     

    numba_absolute_minimum.ipynb

    GitHub Gist: instantly share code, notes, and snippets.

    gist.github.com

     

    728x90