data missing in AbstractFileOutPutOperator

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

data missing in AbstractFileOutPutOperator

chiru
Hi Team,

In my use case im seeing some random issue in writing data to HDFS.

Im using AbstractFileOutPutOperator to write the data to HDFS, and upstream operator generate the data. In DT console we can see the writer operator processed 100 tuples but in hdfs we can 80-90 records. When we kill/shutdown the application we are getting 95-100 records.

Its a random behavior, not same tuple is missing in every run. Please suggest further without killing/shutdown the app we need to write all incoming tuples to HFDS.

let me know if you need more information.

Thanks
Chiranjeevi V 
Reply | Threaded
Open this post in threaded view
|

Re: data missing in AbstractFileOutPutOperator

Venkatesh Kottapalli
You can try adding logs in the operator to see where your records are getting skipped.

-Venky.


> On Aug 9, 2017, at 7:08 AM, chiranjeevi vasupilli <[hidden email]> wrote:
>
> Hi Team,
>
> In my use case im seeing some random issue in writing data to HDFS.
>
> Im using AbstractFileOutPutOperator to write the data to HDFS, and upstream operator generate the data. In DT console we can see the writer operator processed 100 tuples but in hdfs we can 80-90 records. When we kill/shutdown the application we are getting 95-100 records.
>
> Its a random behavior, not same tuple is missing in every run. Please suggest further without killing/shutdown the app we need to write all incoming tuples to HFDS.
>
> let me know if you need more information.
>
> Thanks
> Chiranjeevi V

Reply | Threaded
Open this post in threaded view
|

Re: data missing in AbstractFileOutPutOperator

chiru
Hi Venky,

The records are not skipped intentionally, after shutdown the application we are getting the tuples.
But some times few tuples are missing. we have identified those missing tuples and  tested it. we dont have any conditions to drop those records.

The other issue is its random issue, the same record is not missing in all the runs. In DT console we can see at the writer operator processed required records but the count is not matching with the data written in HDFS.

at operator it is 500 , but in hdfs we have 450 before shutdown. one we shutdown the application the data is flused to hdfs but around 495-500. a very few tuples are missing but not every time.


Thanks
Chiranjeevi V. 

On Wed, Aug 9, 2017 at 10:01 PM, Venkatesh Kottapalli <[hidden email]> wrote:
You can try adding logs in the operator to see where your records are getting skipped.

-Venky.


> On Aug 9, 2017, at 7:08 AM, chiranjeevi vasupilli <[hidden email]> wrote:
>
> Hi Team,
>
> In my use case im seeing some random issue in writing data to HDFS.
>
> Im using AbstractFileOutPutOperator to write the data to HDFS, and upstream operator generate the data. In DT console we can see the writer operator processed 100 tuples but in hdfs we can 80-90 records. When we kill/shutdown the application we are getting 95-100 records.
>
> Its a random behavior, not same tuple is missing in every run. Please suggest further without killing/shutdown the app we need to write all incoming tuples to HFDS.
>
> let me know if you need more information.
>
> Thanks
> Chiranjeevi V




--
ur's
chiru
Reply | Threaded
Open this post in threaded view
|

Re: data missing in AbstractFileOutPutOperator

Sanjay Pujare
When the few tuples are missing are they always the trailing ones or random in between ones?

Also are you shutting down the app or also killing it (which is when you see missing tuples?)

On Wed, Aug 9, 2017 at 11:03 PM, chiranjeevi vasupilli <[hidden email]> wrote:
Hi Venky,

The records are not skipped intentionally, after shutdown the application we are getting the tuples.
But some times few tuples are missing. we have identified those missing tuples and  tested it. we dont have any conditions to drop those records.

The other issue is its random issue, the same record is not missing in all the runs. In DT console we can see at the writer operator processed required records but the count is not matching with the data written in HDFS.

at operator it is 500 , but in hdfs we have 450 before shutdown. one we shutdown the application the data is flused to hdfs but around 495-500. a very few tuples are missing but not every time.


Thanks
Chiranjeevi V. 

On Wed, Aug 9, 2017 at 10:01 PM, Venkatesh Kottapalli <[hidden email]> wrote:
You can try adding logs in the operator to see where your records are getting skipped.

-Venky.


> On Aug 9, 2017, at 7:08 AM, chiranjeevi vasupilli <[hidden email]> wrote:
>
> Hi Team,
>
> In my use case im seeing some random issue in writing data to HDFS.
>
> Im using AbstractFileOutPutOperator to write the data to HDFS, and upstream operator generate the data. In DT console we can see the writer operator processed 100 tuples but in hdfs we can 80-90 records. When we kill/shutdown the application we are getting 95-100 records.
>
> Its a random behavior, not same tuple is missing in every run. Please suggest further without killing/shutdown the app we need to write all incoming tuples to HFDS.
>
> let me know if you need more information.
>
> Thanks
> Chiranjeevi V




--
ur's
chiru

Reply | Threaded
Open this post in threaded view
|

Re: data missing in AbstractFileOutPutOperator

chiru
I hope you got the issue.
its a random issue, the same record is not missing in all the runs. In DT console we can see at the writer operator processed required records but the count is not matching with the data written in HDFS.

the sequence of missing tuples , we are not sure  because its random behavior.we are using shutdown.




Reply | Threaded
Open this post in threaded view
|

Re: data missing in AbstractFileOutPutOperator

Vivek Bhide
Hi Chiru,

Have you tried waiting till the time your output file in HDFS rolls over (from .tmp to .0)? I have observed this in our case that if you query .tmp file, it may not show all the records written to it. The reason could be that the file is still eligible to be written to and output operator still holds the handle to it. Can you try to wait till the file rolls over to verify if the tuples are still missing?

Regards
Vivek