CS/기타 정리

[파이썬] extend vs. append vs. opreator (리스트 연산 비교)

다시 한 번 시작하기 전에

지난 포스팅에 이은 비정기 연재로 이를 이어나가보고자 한다. 이번에는 리스트에 대해 다루어볼 예정이다. 파이썬에서 두 리스트를 합치는 방법은 다양하다. 크게 다음과 같은 방식으로 이루어질 것이다.

함수를 통해 리스트 1에 리스트 2를 합치기 (extend)
+ 연산을 통해 리스트 1에 리스트 2를 합치기 (+=)
일일히 리스트 2의 원소를 리스트 1에 넣어주기 (append)

셋 모두 공통적으로 O(len(리스트2))의 시간복잡도를 갖지만, 우리는 지난 번 min vs. if 포스팅에서 파이썬의 함수 호출에 대한 오버로딩이 성능 저하를 보여주는 것을 실험을 통해 보인 적이 있다!

2023.05.08 - [CS/기타 정리] - [파이썬] Min vs. If, 좀 더 빠르게 (실행시간 정리)

[파이썬] Min vs. If, 좀 더 빠르게 (실행시간 정리)

시작하기 전에 본 포스팅은 파이썬 알고리즘 문제풀이 도중 생각난 의문들을 해결하기 위한 정리를 수행한 기록이다. 파이썬은 매우 느린 언어로, 연산자 지정, 함수 호출 여부 등에 따라 실행

magentino.tistory.com

이번에도 비슷한 결과를 보일 것으로 가설을 세워 볼 수 있다. 즉 일일히 함수 호출을 발생시키는 append가 제일 느리고, inplace operator +=를 사용하는 결과가 제일 빠르리라 예상해 볼 수 있다. 과연 실제로도 그럴지. 한 번 살펴보도록 하자.

실험 세팅

이전에는 실험 세팅에 대한 기록을 해두지 않았는데, 똑같이 replit에서 제공하는 파이썬 인터프리터를 사용해 볼 것이다. 이번에는 100개의 원소를 지닌 리스트 100개를 연이어 합치는 수행을 100번 수행 후, 소요 시간에 평균을 내 볼 것이다.

테스트 코드는 다음과 같다!

from time import perf_counter
import random

time_result1, time_result2, time_result3 = 0., 0., 0.
sample = range(100)

def plus_operator() :
  lst = list()
  result = 0.
  for i in range(100) :
    _lst = random.choices(sample, k = 100)
    start_time = perf_counter()
    lst += _lst
    result += perf_counter() - start_time
  return result

def extend_func() :
  lst = list()
  result = 0.
  for i in range(100) :
    _lst = random.choices(sample, k = 100)
    start_time = perf_counter()
    lst.extend(_lst)
    result += perf_counter() - start_time
  return result

def append_func() :
  lst = list()
  result = 0.
  for i in range(100) :
    _lst = random.choices(sample, k = 100)
    for num in _lst :
      start_time = perf_counter()
      lst.append(num)
      result += perf_counter() - start_time
  return result

#### testing code
for i in range(100) :
  time_result1 += plus_operator()
  time_result2 += extend_func()
  time_result3 += append_func()

print('+ operator result : {}'.format(time_result1 / 100))
print('extend func result : {}'.format(time_result2 / 100))
print('append func result : {}'.format(time_result3 / 100))

실험 결과

+ operator result : 2.9830659277649828e-05
extend func result :  3.1381390399474186e-05
append func result : 0.002210168097353744

역시 예상대로, inplace add를 사용한 버전이 가장 빠른 상황을 보여 주었다. extend 역시 비슷하게 빠르다(inplace add보다는 느리지만 무시할 만한 수준이라 생각된다). 가장 직관적인 방법인 append는 확실히 느릴 수밖에 없다...

사실 이런 경우에 대해 다른 사람들이 테스트를 해보지 않았을 리가 없다. 아래 링크를 같이 참조해보자.

https://stackoverflow.com/questions/3653298/concatenating-two-lists-difference-between-and-extend

Concatenating two lists - difference between '+=' and extend()

I've seen there are actually two (maybe more) ways to concatenate lists in Python: One way is to use the extend() method: a = [1, 2] b = [2, 3] b.extend(a) the other to use the plus (+) operator: ...

stackoverflow.com

https://stackoverflow.com/questions/14446128/python-append-vs-extend-efficiency

Python - append VS extend efficiency

Here is some code that I wrote using Python: from math import sqrt abundant_list = [] for i in range(12,28123+1): dividor_list = [1] for j in range(2, int(sqrt(i))+1): if i%j == 0...

stackoverflow.com

번외. 작은 데이터에 대한 append vs. extend

그렇다면, 리스트의 크기가 매우 작아서 append로도 어떻게든 퉁쳐볼 수 있는 경우는 어떻게 될까? 이를테면 리스트 2가 [1, 2, 3]인 경우와 같이 합치려는 리스트의 크기가 작은 경우를 생각해 볼 수 있다. 이 경우도 비슷하게 실험을 진행해보자.

테스트 코드는 다음과 같다.

from time import perf_counter
import random

time_result1, time_result2, time_result3 = 0., 0., 0.
sample = range(100)

def plus_operator() :
  result = 0.
  for i in range(100) :
    lst1 = random.choices(sample, k = 2)
    lst2 = random.choices(sample, k = 2)
    start_time = perf_counter()
    lst1 += lst2
    result += perf_counter() - start_time
  return result

def extend_func() :
  result = 0.
  for i in range(100) :
    lst1 = random.choices(sample, k = 2)
    lst2 = random.choices(sample, k = 2)
    start_time = perf_counter()
    lst1.extend(lst2)
    result += perf_counter() - start_time
  return result

def append_func() :
  result = 0.
  for i in range(100) :
    lst1 = random.choices(sample, k = 2)
    lst2 = random.choices(sample, k = 2)
    for num in lst2 :
      start_time = perf_counter()
      lst1.append(num)
      result += perf_counter() - start_time
  return result

#### testing code
for i in range(100) :
  time_result1 += plus_operator()
  time_result2 += extend_func()
  time_result3 += append_func()

print('+ operator result : {}'.format(time_result1 / 100))
print('extend func result :  {}'.format(time_result2 / 100))
print('append func result : {}'.format(time_result3 / 100))

실험 결과는 다음과 같다!

+ operator result : 1.9723009845620254e-05
extend func result :  2.7153820774401538e-05
append func result : 4.1002918369486e-05

역시나 비슷한 결과를 보인다. 한 가지 차이라면, append와 extend, inplace add간의 격차가 매우 많이 줄어들었다는 점이다. 즉 append의 함수 호출 횟수 때문에 발생하는 성능 차이라고 볼 수 있다.

그렇다면, 위 케이스에서 random.choices에 들어가는 k = 1인 경우를 생각해보자. 이 경우는 원소의 개수가 1개인 두 리스트를 합치는 경우(혹은 리스트에 원소 하나를 추가하는 경우)와 동일하다. 즉 append 함수 역시 단 1번만 호출되는 상황이다. 실험의 정확성을 위해 반복 횟수를 10만으로 조정하였다. 이 경우 실험 결과는 다음과 같다.

+ operator result : 3.36962239625791e-05
extend func result :  5.0977304243315305e-05
append func result : 3.0483589114192e-05

원소가 1개인 경우는 리스트의 합연산보다는 append가 종합적으로 더 높은 성능을 보여준다! 또한 이런 특이 케이스에서는 원소 1개를 담기 위해 리스트를 생성하는 것부터가 비용이 발생하므로, append 메서드를 사용하는 게 가장 나은 선택임을 알 수 있다.

요약

리스트의 길이와 관계없이, inplace add(+=), extend, append 순으로 성능이 좋다.
다만, 원소가 1개인 경우는 append를 사용하자.

저작자표시 비영리 동일조건

'CS > 기타 정리' 카테고리의 다른 글

[파이썬] Min vs. If, 좀 더 빠르게 (실행시간 정리) (1)	2023.05.08

Contents

당신이 좋아할만한 콘텐츠

[파이썬] Min vs. If, 좀 더 빠르게 (실행시간 정리) 2023.05.08

새소식

인기 검색어