Python Iterator based on sorted column with various length

Question

Python Iterator based on sorted column with various length

43 views Asked by Zewei Song At 22 July 2022 at 03:06

I was trying to write an iterator class in Python that can do a loop for a txt file, in while I would like to group all lines with identical value in the second column:


1	A
2	A
3	B
4	B
5	B
6	C
7	C
8	C
9	C
10	D
11	D
12	D

So I would like my iterator to return four list/tuple one at a time:

[[1,A],[2,A]]
[[3,B],[4,B],[5,B]]
[[6,C],[7,C],[8,C],[9,C]]
[[10,D],[11,D],[12,D]]

Here is my code:

#%% Iterator
class sequence(object):
    def __init__(self, filePath):
        self.file = open(filePath, 'r')
        self.last = []

    def __iter__(self):
        return self

    def __next__(self):
        self.trunk = [self.last]
        stop_checker = False
        while not stop_checker:
            line = self.file.readline()
            if line:  # a solid line
                line = line.strip('\n').split('\t')
                # Check if current line contains a difference contigs
                if self.trunk == [[]]:  # empty trunk, add a new line to it, read next
                    self.trunk=[line]
                elif self.trunk[-1][1] == line[1]:  # contig names matched:
                    self.trunk.append(line)
                else:  # First encounter of a difference contigs, reture th lastt trunk
                    self.last = line
                    return self.trunk               
            else:
                raise StopIteration
                return self.trunk
 
a = sequence('tst.txt')
for i in a:
    print(i)

However, the iterator stops before return the last list, and the result is:

[['1', 'A'], ['2', 'A']]
[['3', 'B'], ['4', 'B'], ['5', 'B']]
[['6', 'C'], ['7', 'C'], ['8', 'C'], ['9', 'C']]

Original Q&A

There are 2 answers

**Zewei Song** · Answer 1 · 2022-07-22T03:26:38+00:00

Thanks for the comment of Blckknight, I work it out with itertools.groupby:

import itertools

# Key function
key_func = lambda x: x.strip('\n').split('\t')[1]

with open('tst.txt', 'r') as f:
    for key, group in itertools.groupby(f, key_func):
        print(key + " :", [i.strip('\n').split('\t') for i in list(group)])

The output:

A : [['1', 'A'], ['2', 'A']]
B : [['3', 'B'], ['4', 'B'], ['5', 'B']]
C : [['6', 'C'], ['7', 'C'], ['8', 'C'], ['9', 'C']]
D : [['10', 'D'], ['11', 'D'], ['12', 'D']]

**maya** · Answer 2 · 2022-07-22T03:31:28+00:00

Grouping can be done using pandas:

import pandas as pd

df = pd.DataFrame({"num": range(1, 13), "value": ["A"] * 2 + ["B"] * 3 + ["C"] * 4 + ["D"] * 3})

res = [list(zip(item["num"], item["value"])) for i, item in df.groupby("value")]
for item in res:
    print(item)

OUTPUT:

[(1, 'A'), (2, 'A')]
[(3, 'B'), (4, 'B'), (5, 'B')]
[(6, 'C'), (7, 'C'), (8, 'C'), (9, 'C')]
[(10, 'D'), (11, 'D'), (12, 'D')]

TechQA.

Python Iterator based on sorted column with various length

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in LOOPS

Related Questions in ITERATOR

Related Questions in STOPITERATION

Popular Questions

Trending Questions