[트러블슈팅] 동시성 문제 해결하기 - 낙관적 락, 비관적 락

최종 프로젝트 wibby - 스터디 통합 관리 플랫폼 개발을 진행하던 와중

스터디 인원 수에서 동시성 문제가 발생할 것이라고 생각하였다.

문제 인식

이 코드는 스터디를 탈퇴하는 로직이다.

@Transactional
public void leaveStudy(Long studyId, Long userId) {
    Study study = findValidStudy(studyId);
    StudyMember studyMember = findValidStudyMember(studyId, userId);
    if (isLastMember(study)) {
        study.delete();
        studyMember.delete();
        return;
    }
    if (studyMember.isLeader()) {
        randomDelegateLeader(studyId, studyMember);
    }

    studyMember.delete();
    study.decreaseMemberCount();
}

leaveStudy() 로직 흐름

- 스터디 엔티티를 조회

- 해당 유저가 마지막 멤버인지 확인

- 마지막 멤버라면 스터디 자체를 삭제

- 마지막 멤버가 아니고 스터디의 리더라면 리더 위임

- 멤버 비활성화

- 스터디의 currentCount 1 감소

테스트 결과

만약 스터디 멤버 2명이 동시에 탈퇴 버튼을 누른다면

동시성 문제가 발생할 것이라고 예상했다.

@DisplayName("2명의 멤버가 스터디를 동시에 탈퇴합니다.")
@Transactional(propagation = Propagation.NOT_SUPPORTED)
@Test
void leaveStudy() throws InterruptedException {
    // given
    Study study = studyRepository.findById(studyId)
        .orElseThrow();
    System.out.println(study.getCurrentCount());

    int threads = 2;
    ExecutorService executorService = Executors.newFixedThreadPool(threads);
    CountDownLatch countDownLatch = new CountDownLatch(threads);

    List<Long> memberIds = studyMemberRepository.findFetchStudyByStudyId(studyId)
        .stream().filter(studyMember -> studyMember.getRole().equals(MEMBER))
        .map(StudyMember::getUserId)
        .toList();

    // when
    for (Long targetUserId : memberIds) {
        executorService.submit(() -> {
            try {
                studyManagementServiceFacade.leaveStudyWithRetry(studyId, targetUserId);
            } finally {
                countDownLatch.countDown();
            }
        });
    }

    countDownLatch.await();
    executorService.shutdown();
    em.clear();

    //then
    Study findStudy = studyRepository.findById(studyId).orElseThrow();
    System.out.println(findStudy.getCurrentCount());
    System.out.println("leaveStudyWithRetry 실행 (studyId={}, userId={})");
    assertThat(findStudy.getCurrentCount()).isEqualTo(1);
}

currentCount=3인 스터디에 멤버 두 명이 동시에 탈퇴하는 테스트를 실행하였다.

1. 두 트랜잭션이 동시에 study.getCurrentCount()를 조회 -> 둘 다 3

2. 각 트랜잭션이 study.setCurrentCount()를 실행

3. 각자 update를 실행하면서 최종 DB 값은 2

실제로는 멤버 2명이 나갔는데 카운트는 한 번만 줄어든 것처럼 되어

실제 탈퇴 횟수가 반영되지 않는 동시성 문제가 발생하였다.

Mysql은 트랜잭션의 격리수준을 변경하지 않는다면 Repetable Read로 설정되어있고

Lost Update, Write Skew 등의 이상 현상이 발생할 수 있다.

나는 Lost Update 이상 현상이 발생한 것이다.

+ Lost Update : 두 개 이상의 트랜잭션이 같은 데이터를 동시에 읽고 각각 수정한 후 저장했을 때

먼저 한 쪽의 변경 사항이 나중에 온 쪽에 의해 덮어씌워져서 사라져 버리는 현상

비관적 락

트랜잭션이 시작될 때 데이터베이스에 락을 걸어 다른 트랜잭션이 접근하지 못하게 하는 방법

- Repository 인터페이스의 메서드에 @Lock 어노테이션을 붙여주고 설정해주고자 하는 LockModeType을 지정

public interface StudyRepository extends JpaRepository<Study, Long> {
    @Lock(LockModeType.PESSIMISTIC_WRITE)
    @Query("select s from Study s where s.id = :id")
    Optional<Study> findByIdWithPessmisticLock(@Param("id") Long id);
}

테스트가 성공하는걸 볼 수 있다 !

- 스레드1이 테이블의 해당 row를 for update로 읽으면 락 획득

- 스레드2가 동일 row 읽으려 하면 락 걸려있어 대기 상태

- 스레드1이 트랜잭션을 commit 하면 트랜잭션 종료, 락 해제

- 스레드2가 대기에서 깨어나 row를 읽고 락 획득

=> 동시에 실행 X, 순차적으로 처리

낙관적 락

대부분의 트랜잭션은 충돌이 발생하지 않는다고 낙관적으로 가정하는 방법

실제 DB에 존재하는 Lock이 아니라 Application Level에서 Lock과 유사한 동작을 하도록 논리적으로 구현

Version을 관리하는 컬럼을 추가해서 데이터를 수정할 때마다 맞는 버전의 데이터를 수정하는지 판단하는 방식

- study 엔티티에 version 컬럼 추가한 후 @Version 어노테이션 선언

@Getter
@Entity
public class Study extends BaseEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    ...

    @Version
    private Long version;
}

- 특정 Study를 조회하면서 낙관적 락 적용

public interface StudyRepository extends JpaRepository<Study, Long> {

	...

	@Lock(LockModeType.OPTIMISTIC)
	@Query("SELECT s FROM Study s WHERE s.id = :id")
	Optional<Study> findByIdWithLock(@Param("id") Long id);
}

- @Trancstional 안에 while-retry문 -> 실패

@Transactional
public void leaveStudy(Long studyId, Long userId) {
    int retry = 0;
    while (retry < MAX_RETRY) {
        try {
            Study study = studyRepository.findByIdWithLock(studyId)
                .orElseThrow(() -> new NotFoundException(STUDY_NOT_FOUND));
            StudyMember studyMember = getStudyMemberById(studyId, userId);

            if (isLastMember(study)) {
                study.delete();
                studyMember.delete();
                completeRecruitmentPostIfExists(studyId);
                return;
            }

            if (studyMember.isLeader()) {
                randomDelegateLeader(studyId, studyMember);
            }

            studyMember.delete();
            study.decreaseMemberCount();
            return;
        } catch (OptimisticLockException | ObjectOptimisticLockingFailureException e) {
            retry++;
            try {
                Thread.sleep(50);
            } catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
                break;
            }
        }
    }
    throw new BusinessException(STUDY_MEMBER_COUNT_UPDATE_FAILED);
}

한 번 트랜잭션이 시작되면 while문이 끝날때까지 rollback이 일어나지 않는다.

JPA, Hibernate는 OptimisticLockException이 터진 순간 현재 트랜잭션을 rollback-only 상태로 표시해

트랜잭션 안에서 commit과 update가 안되고 flush할 때 마다 에러가 나게 된다.

그래서 예외 -> rollback -> 재시도 흐름으로 진행되지 않고

예외 -> 같은 트랜잭션에서 재시도 -> 이미 rollback-only 상태라 테스트가 실패하게 된다.

낙관적 락 + 관심사 분리

이러한 문제를 해결하기 위해 retry 로직만 담당하는 클래스를 만들었다.

- StudyManagementService: 트랜잭션 단위의 실제 비즈니스 로직 수행

- StudyManagementServiceFacade: retry 로직만 담당 (트랜잭션 없음)

- retry만 담당하는 StudyManagementServiceFacade (트랜잭션과 retry 관심사를 분리)

@Slf4j
@Service
@RequiredArgsConstructor
public class StudyManagementServiceFacade {

    private final StudyManagementService studyManagementService;
    private static final int MAX_RETRY = 3;

    public void leaveStudyWithRetry(Long studyId, Long userId) {
        int retry = 0;
        while (retry < MAX_RETRY) {
            try {
                studyManagementService.leaveStudy(studyId, userId);
                return;
            } catch (OptimisticLockException | ObjectOptimisticLockingFailureException e) {
                retry++;

                try {
                    Thread.sleep(50);
                } catch (InterruptedException ex) {
                    Thread.currentThread().interrupt();
                    break;
                }
            }
        }

        log.warn("스터디 currentCount 감소 실패 (studyId={}, userId={})", studyId, userId);
        throw new BusinessException(STUDY_MEMBER_COUNT_UPDATE_FAILED);
    }
}

- leaveStudy() 메서드

@Transactional
public void leaveStudy(Long studyId, Long userId) {
    Study study = studyRepository.findByIdWithLock(studyId)
        .orElseThrow(() -> new NotFoundException(STUDY_NOT_FOUND));
    StudyMember studyMember = getStudyMemberById(studyId, userId);

    if (isLastMember(study)) {
        study.delete();
        studyMember.delete();
        completeRecruitmentPostIfExists(studyId);
        return;
    }

    if (studyMember.isLeader()) {
        randomDelegateLeader(studyId, studyMember);
    }

    studyMember.delete();
    study.decreaseMemberCount();
}

테스트가 성공하는걸 볼 수 있다 !

- 스레드1과 스레드2가 동시에 같은 row를 조회 -> 둘 다 version=0 상태 읽음

- 스레드1이 먼저 update 실행 -> version=1로 update 성공

- 스레드2도 update 실행 시도 -> DB는 이미 version=1이 되었기 때문에 조건 불일치로 update 실패 -> OptimisticLockException 발생

- 스레드2의 트랜잭션은 rollback 되고 @Retryable이 동작해 새로운 트랜잭션 시작 -> 다시 조회 (version=1) -> update 성공 (version=2)

=> 동시에 실행 시도 but 한 쪽만 성공하고 나머지는 실패 후 재시도

낙관적 락 + @Retryable

좀 더 찾아보던 중에 라이브러리를 사용해서 간편하게 재시도를 요청할 수 있다는 것을 알게 되었다.

@Retryable을 적용하여 예외 발생 시 트랜잭션을 rollback 후 새로운 트랜잭션으로 재시도 하였다.

@Retryable은 Spring AOP가 메서드 전체 실행을 감싸고

지정한 예외가 발생하면 트랜잭션을 롤백하고 새로운 트랜잭션으로 메서드를 다시 실행해주는 기능이다.

같은 트랜잭션 안에서 재시도하는 단순 while-loop retry와 달리

DB 락을 해제한 후 새로운 트랜잭션으로 다시 시도할 수 있다.

(예외 시 트랜잭션 자체를 새로 시작하기 때문에 락이 해제된 상태에서도 재시도가 가능)

또한 매번 수동으로 try-catch로 감싸서 다시 실행하는 건 중복 코드와 복잡성을 유발하기 때문에

@Retryable을 사용하여 이 과정을 일관되게 자동화함으로써 충돌 상황에서도 안전하게 재시도를 보장했다.

- build.gradle 의존성 추가

implementation 'org.springframework.retry:spring-retry'
implementation 'org.springframework.boot:spring-boot-starter-aop'

- StudyManagementServiceFacade에 @Retryable 적용

@Slf4j
@Service
@RequiredArgsConstructor
public class StudyManagementServiceFacade {

    private final StudyManagementService studyManagementService;

    @Retryable(
        retryFor = {OptimisticLockException.class, ObjectOptimisticLockingFailureException.class},
        backoff = @Backoff(delay = 50)
    )
    public void leaveStudyWithRetry(Long studyId, Long userId) {
        studyManagementService.leaveStudy(studyId, userId);
    }

    @Recover
    public void recover(OptimisticLockException e, Long studyId, Long userId) {
        log.warn("스터디 currentCount 증감 실패 (studyId={}, userId={})", studyId, userId, e);
        throw new BusinessException(STUDY_MEMBER_COUNT_UPDATE_FAILED);
    }

    @Recover
    public void recover(ObjectOptimisticLockingFailureException e, Long studyId, Long userId) {
        log.warn("스터디 currentCount 증감 실패 (studyId={}, userId={})", studyId, userId, e);
        throw new BusinessException(STUDY_MEMBER_COUNT_UPDATE_FAILED);
    }
}

@Transactional
public void leaveStudy(Long studyId, Long userId) {
    Study study = studyRepository.findByIdWithLock(studyId)
        .orElseThrow(() -> new NotFoundException(STUDY_NOT_FOUND));
    StudyMember studyMember = getStudyMemberById(studyId, userId);

    if (isLastMember(study)) {
        study.delete();
        studyMember.delete();
        completeRecruitmentPostIfExists(studyId);
        return;
    }

    if (studyMember.isLeader()) {
        randomDelegateLeader(studyId, studyMember);
    }

    studyMember.delete();
    study.decreaseMemberCount();
}

테스트가 성공하는걸 볼 수 있다 !

- 스레드1과 스레드2가 동시에 같은 row를 조회 -> 둘 다 version=0 상태 읽음

- 스레드1이 먼저 update 실행 -> version=1로 update 성공

- 스레드2도 update 실행 시도 -> DB는 이미 version=1이 되었기 때문에 조건 불일치로 update 실패 -> OptimisticLockException 발생

- 스레드2의 트랜잭션은 rollback 되고 @Retryable이 동작해 새로운 트랜잭션 시작 -> 다시 조회 (version=1) -> update 성공 (version=2)

결론

비관적 락을 사용하면 충돌을 예방하기 위해 DB에서 미리 락을 걸어

모든 요청을 순차적으로 처리할 수 있지만 대기 시간이 늘어나거나 교착 상태로 이어질 수 있고

낙관적 락을 사용하면 충돌이 발생했을 때 예외를 감지하여 재시도를 하여 처리하거나

실패한 경우 사용자에게 예외 메시지(잠시 후 다시 시도해주세요 등)를 전달할 수 있다.

동시에 스터디 멤버가 스터디에 참여하거나 탈퇴하는 일은 자주 발생하지 않을것이라 예상하였다.

그렇기 때문에 낙관적 락을 해결 방법으로 선택하였으며

동시성 충돌 시 하나의 트랜잭션만 성공하고 나머지는 재시도하도록 하여

최종적으로 데이터 무결성을 보장하도록 수정하였다 🎉 🎉

저작자표시 비영리 변경금지 (새창열림)

'트러블슈팅' 카테고리의 다른 글

[트러블 슈팅] 페이징 쿼리 최적화 - offset, no offset, covering index (1)	2025.09.21
[트러블슈팅] 인덱스로 쿼리 최적화하기 (0)	2025.09.17
[트러블슈팅] JPA N+1 문제와 해결법 (0)	2025.06.27
[트러블슈팅] JPA 양방향 관계 설정 시 삭제가 되지 않는 문제 (0)	2025.05.29

문제 인식

테스트 결과

비관적 락

낙관적 락

낙관적 락 + 관심사 분리

낙관적 락 + @Retryable

결론

'트러블슈팅' 카테고리의 다른 글

티스토리툴바