BAITS.VDJ.tl.process_group_with_neighbor_count

BAITS.VDJ.tl.process_group_with_neighbor_count#

BAITS.VDJ.tl.process_group_with_neighbor_count(group, threshold=0.85, cdr3nt_col='cdr3nt')#

Process a grouped BCR dataframe to assign clusters and neighbor counts.

For each group of sequences sharing the same Vgene, Jgene, and CDR3 length, sequences are clustered and the number of neighbors with a single nucleotide difference is counted for each sequence.

Parameters:
  • group (tuple) – Tuple of ((Vgene, Jgene, CDR3_nt_length), dataframe) for the group.

  • threshold (float, default=0.85) – Minimum identity threshold used for clustering.

  • cdr3nt_col (str, default="cdr3nt") – Column containing CDR3 nucleotide sequences.

Returns:

Each tuple contains: (Vgene, Jgene, sequence, cluster_id, neighbor_count)

Return type:

list of tuples