user3440244 - 2 months ago 19

R Question

After searching the forum, I did not find a good solution for this question. If I missed it, please tell me.

I need to count the unique values in one column in EXCEL 2010.

The worksheet has 1 million rows and 10 columns. All cell values are string or numbers.

I used the solution at Count unique values in a column in Excel

`=SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))`

But, it runs so long time that the EXCEL is almost frozen. And, it generates 25 processes in Win 7.

Are there more efficient ways to do it?

Also, in the column, all values have for format of

`AX_Y`

here, A is a character, X is an integer, Y is an integer from 1 to 10.

For example, A5389579_10

I need to cut off the part after (including) undersocre. for the example,

`A5389579`

This is what I need to count as unique values in all cells in one column.

`For example, A5389579_10`

A1543848_6

A5389579_8

Here, the unique value has 2 after removing the part after underscore.

How to do it in EXCEL VBA and R (if no efficient solution for EXCEL)?

Answer

If you want to do this by VBA, you can take advantage of the `Collection`

object. Since collections can only contain unique values, trying to add all of your input data to a collection will result in an array of unique values. The code below takes all the variables in a selected range and then outputs an array with distinct values to an other sheet (in this case a sheet named Output).

```
Sub ReturnDistinct()
Dim Cell As Range
Dim i As Integer
Dim DistCol As New Collection
Dim DistArr()
Dim OutSht As Worksheet
Dim LookupVal As String
Set OutSht = ActiveWorkbook.Sheets("Output") '<~~ Define sheet to putput array
If TypeName(Selection) <> "Range" Then Exit Sub
'Add all distinct values to collection
For Each Cell In Selection
If InStr(Cell.Value, "_") > 0 Then
LookupVal = Mid(Cell.Value, 1, InStr(Cell.Value, "_") - 1)
Else
LookupVal = Cell.Value
End If
On Error Resume Next
DistCol.Add LookupVal, CStr(LookupVal)
On Error GoTo 0
Next Cell
'Write collection to array
ReDim DistArr(1 To DistCol.Count, 1 To 1)
For i = 1 To DistCol.Count Step 1
DistArr(i, 1) = DistCol.Item(i)
Next i
'Outputs distinct values
OutSht.Range("A1:A" & UBound(DistArr)).Value = DistArr
End Sub
```

Note that since this code writes all the distinct values to a single column in the `OutSht`

-sheet, this will return an error if there are more than 1,048,576 distinct values in your dataset. In that case you would have to split the data to be filled into multiple output columns.