如何解决在Pyspark中找到字符串中子字符串的位置
我正在尝试查找看起来像这样的列的位置
Length ID
+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX 1
XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX 2
++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 3
XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX 4
+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX 5
+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX 6
XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX 7
++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 8
XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX 9
+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX 10
我想找到从位置15到25长度均为X的ID。
我试图在Python和SQL代码中使用len(length)
:
SELECT
ID,CHARINDEX('X','length',15),25)
而且运行时间真的很长。
但是两者都不能很好地工作。
我想知道是否有更简单的方法可以在pyspark或SQL中更有效地做到这一点。
谢谢
解决方法
以这种方式尝试。可能是英语问题,无法完全理解。
1。。#define PEESerial Serial2
byte byte1,byte2,byte3,byte4,byte5,byte6;
unsigned long period;
byte byteArray[6];
int count=0;
void setup() {
//Set the primary Serial (USB Serial)
Serial.begin(9600);
// Set the pin Serial - known value based on transmitted signal Baud Rate
PEESerial.begin(1200);
}
void loop() {
// read from port 9 (serial2 Rx) IF there
// is a signal to read.
if (Serial2.available()&& count<6) {
byteArray[count] = PEESerial.read();
Serial.println(byteArray[count],HEX);
Serial.println(byteArray[count],BIN);
count++;
}
else {
count=0;
}
}
的数量在15到25之间。
X
2。。import pyspark.sql.functions as f
df2 = df.withColumn('len',f.size(f.split('Length','X')) - 1)
df2.show(10,False)
+----------------------------------------------------+----+---+
|Length |ID |len|
+----------------------------------------------------+----+---+
|+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX|1.0 |13 |
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|2.0 |25 |
|++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|3.0 |34 |
|XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX|4.0 |32 |
|+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX|5.0 |27 |
|+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX|6.0 |13 |
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|7.0 |25 |
|++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|8.0 |34 |
|XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX|9.0 |32 |
|+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX|10.0|27 |
+----------------------------------------------------+----+---+
df2.filter('len BETWEEN 15 AND 25').show(10,False)
+----------------------------------------------------+---+---+
|Length |ID |len|
+----------------------------------------------------+---+---+
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|2.0|25 |
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|7.0|25 |
+----------------------------------------------------+---+---+
的位置15到25都是Length
。
X
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。