Create random.randint with condition in a group by?

python pandas

146 观看

3回复

112 作者的声誉

I have a column called: "cars" and want to create another called "persons" using random.randint() which i have:

dat['persons']=np.random.randint(1,5,len(dat))

This is so i can put the number of persons who use these but i'd like to know how to put a condition so in the 'suv' category will be generated only numbers from 4 to 9 for example.

cars | persons
suv     4
sedan   2
truck   2         
suv     1      
suv     5
作者: J_p 的来源 发布者: 2017 年 12 月 27 日

回应 3


2

764793 作者的声誉

决定

You can create an index for your series, where matching rows have True, and everything else has False. You can then assign to the rows matching that index using loc[] to select the rows; you then generate just the number of values for those selected rows:

m = dat['cars'] == 'suv'
dat.loc[m, 'persons'] = np.random.randint(4, 9, m.sum())

You could also use apply on the cars series to create the new column, creating a new random value in each call:

dat['persons'] = dat.cars.apply(
    lambda c: random.randint(4, 9) if c == 'suv' else random.randint(1, 5))

But this has to make a separate function call for each row. Using a mask will be more efficient.

作者: Martijn Pieters 发布者: 2017 年 12 月 27 日

0

171 作者的声誉

There may be a way to do this with something like a groupby that's more clever than I am, but my approach would be to build a function and apply it to your cars column. This is pretty flexible - it will be easy to build in more complicated logic if you want something different for each car:

def get_persons(car):
    if car == 'suv':
        return np.random.randint(4, 9)
    else:
        return np.random.randint(1, 5)
dat['persons'] = dat['cars'].apply(get_persons)

or in a more slick, but less flexible way:

dat['persons'] = dat['cars'].apply(lambda car: np.random.randint(4, 9) if car == 'suv' else np.random.randint(1, 5))
作者: Jacob H 发布者: 2017 年 12 月 27 日

1

169930 作者的声誉

Option 1
So, you're generating random numbers between 1 and 5, whereas numbers in the SUV category should be between 4 and 9. That just means you can generate a random number, and then add 4 to all random numbers belonging to the SUV category?

df = df.assign(persons=np.random.randint(1,5, len(df)))
df.loc[df.cars == 'suv', 'persons'] += 4

df

    cars  persons
0    suv        7
1  sedan        3
2  truck        1
3    suv        8
4    suv        8

Option 2
Another alternative would be using np.where -

df.persons = np.where(df.cars == 'suv', 
                      np.random.randint(5, 9, len(df)), 
                      np.random.randint(1, 5, len(df)))
df

    cars  persons
0    suv        8
1  sedan        1
2  truck        2
3    suv        5
4    suv        6
作者: cs95 发布者: 2017 年 12 月 27 日
32x32