### Create random.randint with condition in a group by?

146 观看

3回复

112 作者的声誉

I have a column called: "cars" and want to create another called "persons" using random.randint() which i have:

``````dat['persons']=np.random.randint(1,5,len(dat))
``````

This is so i can put the number of persons who use these but i'd like to know how to put a condition so in the 'suv' category will be generated only numbers from 4 to 9 for example.

``````cars | persons
suv     4
sedan   2
truck   2
suv     1
suv     5
``````

### 回应 (3)

2

764793 作者的声誉

You can create an index for your series, where matching rows have `True`, and everything else has `False`. You can then assign to the rows matching that index using `loc[]` to select the rows; you then generate just the number of values for those selected rows:

``````m = dat['cars'] == 'suv'
dat.loc[m, 'persons'] = np.random.randint(4, 9, m.sum())
``````

You could also use `apply` on the `cars` series to create the new column, creating a new random value in each call:

``````dat['persons'] = dat.cars.apply(
lambda c: random.randint(4, 9) if c == 'suv' else random.randint(1, 5))
``````

But this has to make a separate function call for each row. Using a mask will be more efficient.

0

171 作者的声誉

There may be a way to do this with something like a groupby that's more clever than I am, but my approach would be to build a function and apply it to your cars column. This is pretty flexible - it will be easy to build in more complicated logic if you want something different for each car:

``````def get_persons(car):
if car == 'suv':
return np.random.randint(4, 9)
else:
return np.random.randint(1, 5)
dat['persons'] = dat['cars'].apply(get_persons)
``````

or in a more slick, but less flexible way:

``````dat['persons'] = dat['cars'].apply(lambda car: np.random.randint(4, 9) if car == 'suv' else np.random.randint(1, 5))
``````

1

169930 作者的声誉

Option 1
So, you're generating random numbers between 1 and 5, whereas numbers in the SUV category should be between 4 and 9. That just means you can generate a random number, and then add 4 to all random numbers belonging to the SUV category?

``````df = df.assign(persons=np.random.randint(1,5, len(df)))
df.loc[df.cars == 'suv', 'persons'] += 4

df

cars  persons
0    suv        7
1  sedan        3
2  truck        1
3    suv        8
4    suv        8
``````

Option 2
Another alternative would be using `np.where` -

``````df.persons = np.where(df.cars == 'suv',
np.random.randint(5, 9, len(df)),
np.random.randint(1, 5, len(df)))
df

cars  persons
0    suv        8
1  sedan        1
2  truck        2
3    suv        5
4    suv        6
``````